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COVARIANCE MATRICES^ 

By Noureddine El Karoui 

University of California, Berkeley 

We consider the asymptotic fluctuation behavior of the largest 
eigenvalue of certain sample covariance matrices in the asymptotic 
regime where both dimensions of the corresponding data matrix go 
to infinity. More precisely, let X be an n x p matrix, and let its rows 
be i.i.d. complex normal vectors with mean 0 and covariance Ep. 

We show that for a large class of covariance matrices Ep, the largest 
eigenvalue of X*X is asymptotically distributed (after recentering 
and rescaling) as the Tracy-Widom distribution that appears in the 
study of the Gaussian unitary ensemble. We give explicit formulas for 
the centering and scaling sequences that are easy to implement and 
involve only the spectral distribution of the population covariance, n 
and p. 

The main theorem applies to a number of covariance models found 
in applications. For example, well-behaved Toeplitz matrices as well 
as covariance matrices whose spectral distribution is a sum of atoms 
(under some conditions on the mass of the atoms) are among the 
models the theorem can handle. Generalizations of the theorem to 
certain spiked versions of our models and a.s. results about the largest 
eigenvalue are given. We also discuss a simple corollary that does 
not require normality of the entries of the data matrix and some 
consequences for applications in multivariate statistics. 

1. Introduction. Sample covariance matrices are a fundamental tool of 
multivariate statistics. In the classical setting, one starts with an n x p data 
matrix X and studies asymptotic properties of 5 = {X — X)'{X — X )/(re — 1) 
when p is fixed and re grows to infinity. The classic paper [1] answered most 
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of the relevant questions concerning the eigenvalues of S in the setting where 
the rows of X are i.i.d. N{M, S). It was shown in [1] that, from an eigenvalue 
point of view, S was a good estimator of S. A thorough account of the 
classical case can be found in [2], Chapters 11 and 13. 

Nowadays, statisticians are working with datasets of increasingly larger 
size and the practical relevance of the assumption that p is fixed and n 
goes to inhnity is often doubtful. It might also be counterproductive in 
applications. A significant effort has therefore been made recently to try to 
understand the asymptotic behavior of certain classical tools in multivariate 
analysis, such as the largest eigenvalue of S, in the setting where p and n 
both grow to infinity. 

Large-dimensional sample covariance matrices are also of interest in other 
fields than statistics. Matrices X whose entries are comp lex-valued and their 
singular values are also of interest in different fields of applications and in 
particular in communications engineering. They are objects of great interest, 
for instance, in wireless communications (see, e.g., [32] and [38]). 

In the rest of the paper, the data matrix will always be called X. The 
eigenvalues of X*X will be denoted The population eigen¬ 

values, that is, the eigenvalues of Sp will be called Ai > A 2 > • • • > Ap. 

To situate our paper in the current literature, let us recall a few results 
that have been recently obtained. When the true covariance matrix is Id 
and the entries of X are either standard complex or standard real normal 
distributed, results in [14, 24, 25] and [12] showed that 

h{X*X)-pn,p 

it n and p ^ oc, - — TW, 

where pLn,p and an,p are explicit sequences (which do not depend on whether 
the real or the complex case is under consideration), and the limiting law is a 
Tracy-Widom distribution. When the entries are standard complex normal, 
the limiting law is the Tracy-Widom distribution appearing in the study of 
the Gaussian unitary ensemble (see [35]). When they are real normal, it is 
the one corresponding to the Gaussian orthogonal ensemble (see [36]). 

More recently, the paper [6] looked at finite-dimensional perturbations of 
the Id covariance matrix. The authors considered so-called “spiked” covari¬ 
ance models, advocated in [25], where a finite number— k —of eigenvalues are 
changed to a value different from 1 and the remaining p — k eigenvalues are 
fixed at 1. They discovered a very interesting phase transition phenomenon, 
with the behavior of li changing drastically depending on how far away Ai, 
the largest eigenvalue of Sp, is from the bulk of the spectrum of Sp. In their 
case, this bulk was of course concentrated at 1. 

In the course of their analysis, they develop powerful tools to analyze the 
problem. In particular, their Proposition 2.1 (for which they also give credit 
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to K. Johansson), and the snbsequent remarks are finite dimensional and 
valid whatever the true covariance structure. We exploit in this paper the 
powerful representations obtained in [6] to handle a much more general class 
of covariance matrices than finite perturbations of the Id matrix. 

The motivations for doing so are many. From a theoretical standpoint, 
it is somewhat unclear at this point what features, if any, of the covari¬ 
ance structure of the random variables are responsible for the appearance 
of Tracy-Widom laws. One might ask, for instance, if it is the fact that the 
bulk of the true eigenvalues is exactly concentrated at one point. We will 
show that intuitively what seems needed is a weaker condition, the clumping 
of a fraction of eigenvalues close to the largest one. 

From an applications standpoint, many covariances appearing in different 
fields of science are not hnite-dimensional perturbations of the Id matrix. 
Block-diagonal covariance matrices are of particular interest since they are 
accepted models for, say, the correlation of genes in microarray analysis (a 
topic of intense statistical research at the time being), or the correlation 
of the returns of stocks of companies in financial applications. Covariances 
that are sums of atoms, for example, a% of the variables have variance 
Ai and 1 — a% have variance A 2 , are also of interest, especially in light of 
Theorem 1.1b in [6]. We will come back to this in Section 4. In other respects, 
covariance matrices that are also Toeplitz matrices are very natural in the 
analysis of time-series data, since the covariance structure of a stationary 
time-series is a Toeplitz matrix. 

Before we state our main theorem, we need to introduce some terminology 
and set some notation. We will be working with nxp matrices X, whose rows 
{Xk}k=i,...,n are i.i.d. A/c(0,Sp). By definition, this means that Xj. = Ffc + 
iZk, where and are independent (real) AA(0,Sp/2). The matrix W = 
X*X is then called a complex Wishart matrix, with n degrees of freedom 
and covariance Sp. It will be abbreviated IFc(Sp,n). 

We will call the eigenvalues of Sp Aj, with Ai > A 2 > • • • > Ap. The eigen¬ 
values of IFc(Flp,n) will be denoted li, with the same ordering convention, 
that is, li is the largest eigenvalue of X*X. 

It is well known in statistics that if X^ are i.i.d. A/c(M, Sp), then (X — 
X)*{X — X) is Wc{Z,p,n — 1). For this reason, we will always assume that 
the Xfc’s are J\fc{0,T.p). 

We are now ready to state the main theorem. 

Theorem 1. Let us consider complex Wishart matrices ITc(Sp,n). Let 
Ai be the largest eigenvalue of Sp and let Ap be the smallest one. Let Hp be 
the spectral distribution of Tip. Let c be the unique solution in [0,l/Ai(Sp)) 
of the equation 

c = c(Sp, n,p), c G [0,1/Ai (Sp)): J ( 3 ^) ' dHp{X) = 


( 1 ) 
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We assume that n/p > 1 is uniformly bounded, limsup Ai < oo, liminf Ap > 0 
and limsupAic < 1. We denote by G the class of models {{T,p,n,p)} for which 
these conditions hold. We call 


( 2 ) 



Ac 

1 — Ac 




Let li be the largest eigenvalue of Wcijip,n), that is, li = li{X*X), where 
X is an n X p matrix whose rows are i.i.d. A/c(0,Sp). Then we have, as n 
goes to oo, 


li — np, 


TWa. 


Moreover, if we denote by Fq the cumulative distribution function o/TWa, 
we can find e > 0 and a continuous, nonincreasing function C (that may 
depend on the models under consideration and e ) such that 


Vso 3A^o ■s'> sq and n> Nq implies 


P 


li — np 


n 


1/3 


cr 


< s^ - Fq{s) 


< 


C{so)e 


—esp 


n 


1/3 


Using these results, their proofs, and a little bit more work, we can prove 
the following corollaries: 


Corollary 1. 
have 


In the setting of Theorem 1, if {{T,p,n,p)} is in Q, we 


li 

n 


p- 


a.s. 


Corollary 2. In the setting of Theorem 1, i/{(Sp,n,p)} is in Q, the 
k-largest eigenvalues of X*X, properly recentered and rescaled, converge to 
their Tracy-Widom counterpart. 


Before we proceed, let us remind the reader that the cumulative distribu¬ 
tion function of TWa is known. After introducing the intermediary function 
q defined by 

q"{x) = xq{x) + 2q^{x), 
q{x) Ai(x) as X —> oo, 


Fq satisfies (see [35]) 









LARGEST EIGENVALUE OE WISHART MATRICES 


5 


We will discuss in greater detail the potential usage of the theorem in 
Section 4, but we want to highlight sufficient conditions under which it 
applies and give a few examples before we give the proof. More examples, 
additional results concerning spiked versions of matrices in Q and a remark 
about the fact that the bias of li is (in some cases) independent of the 
distributional assumptions made on the entries will be found in Section 4. 

Corollary 3 (Sufficient conditions). When the following five condi¬ 
tions are all satisfied, the theorem applies: 

1. n/p remains bounded and n>p; 

2. Hp ^ Hoc, in the usual weak convergence sense; 

3. Ai(Sp) —> Ai(oo) = sup support//oo < oo; 

4. Ap(Sp)—> Aoo(oo) = inf support//oo > 0; 

5. Hoc has a density hoo{X) in a (left) neighborhood of Xfioo), and in this 
neighborhood, hoo(A) > i?(Ai(cxD) — A) for some B>Q. 

As a consequence we see that the result applies to: 

• Symmetric Toeplitz matrices—with parameters ao,ai,...—for which 

< oo, the function 

OO 

a: a{uj) = oq + 2 ^ cos{kuj) 
k=l 

has a derivative that changes sign only a finite number of times on [0,27r], 
and for which the distribution A of a does not have atoms {F{x) = 
^Leb{w G [0,27r]: /(w) < x}). 

• Covariances that have uniformly spaced eigenvalues on an interval 
as long as C > 0 and ^ < oo. 

Also, as shown in Appendix A.3.1, if Hp has an atom of mass zz(p) at Ai(Sp) 
and liminfi/(p) > 0, assuming that limsupAi < oo, n/p remains bounded 
and liminf Ap(Sp) > 0, the theorem holds. 

Hence the Id case, which was investigated in [14, 24] and [25], is a special 
case of our main theorem. Also, since spiked models with a “small” spike 
are in Q (see Section 4), the results of [6] showing convergence to TW 2 are 
also a subcase of our main result. 

2. Framework. As is — almost — classical for this problem, one tries to 
represent the marginal distribution, P[li/n <x), as the determinant of / — 
Kn,p, where Kn,p is a trace class operator acting on L^([x,oo)). It greatly 
simplifies the analysis if one is able to represent Kn,p as the product of 
Hilbert-Schmidt operators, say Hn^pJn,p- The problem is even more tractable 
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if the kernels of those operators have the property that Hn,p{x, y) = Hn,p{x + 
y), and similarly for Jn^p- 

Let us mention before we proceed that we will be denoting the trace class 
norm of an operator K hy ||iL||i. Its Hilbert-Schmidt norm will be denoted 
by ||iL|| 2 - An introduction to these concepts can be found in [30], Section 
VI.6 or [16], Chapter 4. 

2.1. Finite-dimensional representation of operators. Proposition 2.1 in 
[6] and their remarks in (82)-(85) remarkably managed to obtain all the 
characteristics of the representations we wished for in the case of completely 
general Up. Since the authors of [6] credit Johansson for the very elegant 
proof they present, we will call this theorem Baik-Ben Arous-Johansson- 
Peche. Here is what it states. 

Theorem 2 (Baik-Ben Arous-Johansson-Peche). Let us consider an 
nxp matrix X with rows i.i.d. A/c(0,Sp). Let us assume without loss of 
generality that Ap(Sp) > 0. Let TTj = l/A*. Let g G M 6e such that 0 < g < tti. 
Let Kn,p he the operator on L^([s,oo)) with kernel 



Here S (resp. T) is a simple closed contour oriented counterclockwise and 
encircling 0 (resp. tti, ... ^iTp). Then, if we denote by li the largest eigenvalue 
of X*X, we have 



r 






Fig. 1. Graphical depiction ofV and 
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Moreover, Kn,p can be rewritten as 


( 6 ) 



“ 1 “ “ 1 “ y') dn, 


with 


( 7 ) 



fcVi ^ 


- dz, 



p 


( 8 ) 


ey^(^-<i^w-'^Y[{7rk-w)dw. 


k=l 


Note that H should be strictly to the left of F. 

We reproduce their Figure 2 as our Figure 1 for the convenience of the 
reader to give a graphical representation of F and H. We refer the reader to 
Remark 2.1 in [6] for a discussion of the meaning of q. For our purposes, it 
will be enough to know that q is essentially a free parameter that regularizes 
the operators we deal with. 

2.2. Recentering, rescaling and classical operator theory arguments leading 
to weak convergence. Once the very important representations mentioned 
in (6)-(8) are obtained, the path to showing weak convergence is classical 
in this type of problem. One needs to find centering and scaling sequences 
such that the recentered and rescaled version of Kn,p converges in trace 
class norm to its limit. Trace class norm plays an important role because 
the determinant det(/ — •) is continuous with respect to that norm. 

Since (6) shows that Kn,p = Hn,pJn,p, the problem reduces to showing 
convergence in Hilbert-Schmidt norm of Hn,p and Jn,p (once again properly 
recentered and rescaled) to their limit. This comes essentially from the fact 
that if A and B are Hilbert-Schmidt operators, AB is trace class with 
||2li?||i < ||A|| 2 ||H ||2 and some elementary algebra. 

The authors of [6], in their Section 2.2, prepare the rest of their paper 
by doing recentering and scaling of the operators already specializing to the 
case of interest to them, namely finite-dimensional perturbations of the Id 
matrix. We do it here for general Sp. 

Let us be more explicit now that we have explained the basic ideas. Be¬ 
cause (5) is exact in finite dimension, one has 



and Sn,p has kernel 

Sn,p{.X, y) — CTYi^pKn,p^y-n,p T CT^i^pix -\- s), Hn,p T crn,p{y “L s))' 
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This is what we called earlier the recentered and rescaled operator. Because 
of the representation given in (6), we see that 


with 


/ Hn^p{x + S T U^Jn,p{.y T S + It) du, 


’ 27r Jr T^k- z 


Jn,p{x) — 


_ nan,p f nan,px{z-q) np,n,p{z-q) _ 


‘It: J~ 


{iTk - z) dz. 


k=l 


From an operator-theoretic standpoint, the three formulae above mean 
that 


Sn,p — Hpipjpip^ 

and we can now view them as operators acting on L^([s,oo)) with kernel, 
that is, Hn^p{x,y) = Hn^p(x + y — s). Now, since iTk = 1/Afc, it is clear that 

Hnpix) = ^^^det(S„) / 17- —dz 

’ 27r Jr 1 - zXk 

and 




na. 


n,p 


2tt det(Sj; 


^ncFn^px(z-q) niin^p{z-q) J_ 


P 

]^(1 - Afcz) dz. 


k=l 


Being primarily interested in the product H^^pJ^^p and not the individual 
operators, we see (with [6]) that we have a little bit of choice in the operators 
we wish to work with. In particular, we can choose to work with Hn,pHn,p 
and Jn,p/i^n,p for any nonzero sequence Hn,p- So we can get rid of the det(Sp) 
term appearing in the previous display and work with 


(9) An,p{x) = - 


na. 


n,p 


27Ti 


^-ncrn,px{z-q) -n^in^p{z-q) n 


.n.r 


k=l 


— zXi 


■ dz, 


(10) Bn,p{x) 


T^^n,p 

2T:i 


^nan,px{z-q)^npi,n,p{z-q) _ 


P 

]^(1 - Xkz)dz. 


k=l 


We now have Sn,p = An,pBn,p and An,p and Bn,p are operators on L^([s,oo)) 
with kernels An^p{x, y) = An^p{x + y — s) and similarly for Bn,p. Since we are 
aiming to show convergence to TW 2 , the Airy function will play a central 
role in our analysis. We will denote it by Ai. Showing weak convergence of 
li{X*X) to the Tracy-Widom law reduces to finding Kn,p and “good” A^o 
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and Boo such that ||Kn,p^n,p —^cxd ||2 —>-0 and \\Bn^p/Kn,p — Boo \\2 —> 0. Since, 
if we view the operators as acting on L^([s,oo)), An^p{x,y) = An,p{x + y — 
s) and similarly for the Airy operator, Ai(x,y) = Ai(x + y — s), this will 
essentially amount to just showing that Kn,pAn^p{x) — Aoo{x) —> 0 pointwise, 
Aqo being a simple modification of the Airy function, and that both functions 
go to 0 fast enough (e.g., faster than for some 6 > 0) at oo. 

The operator-theoretic arguments used to prove Theorem 2 have consid¬ 
erably simplified the problem, at least conceptually: we have moved from 
the problem of studying an integral in MP to that of analyzing a function of 
one real variable. Note that this was also the case with previous studies (see 
[25]), where arguments from [37] and [41] (where some of the ideas behind 
Theorem 2 can be found) were used to reduce the complexity of the problem 
to the same degree. 

What is left to do now is very clear. We just need to find iJ-n,p, o'n,p, T, H 
and Kn,p such that Kn,pAn,p — Aqo and Bn,p/nn,p — B^o go to 0 (in Hilbert- 
Schmidt norm) when n,p go to oo, for appropriate A^o and Boo - More details 
on these functions will be found in Propositions 1 and 2. The next section 
will be devoted to doing all of this. 


3. Proof of the main result. A point of terminology before we pro¬ 
ceed: we will interchangeably call An^p either the operator whose kernel is 
An,p{x + y) or the corresponding function. This simplifies the notation and 
the exposition. If there is some ambiguity, we will say precisely if we refer 
to the operator, its kernel or the function that defines the kernel. 

The strategy of the proof is the same as that of [6] . Loosely speaking, the 
functions A^^p and Bn,p can be viewed as integrals depending on parameters 
going to oo. The functions to integrate contain elements of the type 
This is a situation where one can try to use steepest descent analysis. 


3.1. Focus of the analysis. The expression defining An^p in (9) is some¬ 
what involved, but we will concentrate mostly on 


^-npn^p(z-q) n 


which can be rewritten as 

/ V N 

f{z) = exp -ny,n,p{z - q) +nlog{z) - ^ log(l - zXk) 

\ k=l J 


1 


wherever this expression makes sense. [We use the principal branch of the 
log, log(z) = log(jzj) -|- i arg( 2 :), —vr < avg{z) < tt.] The sum appearing in the 
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definition of / can be rewritten as an integral against the spectral distribu¬ 
tion of Sp, a distribution we call Hp, and we finally get 

(11) fi^) = -l^n,p(.z -q) + log(z) - - / log(l - zX) dHp{X). 

71/ J 

It is clear that / depends on Sp, n and p but we choose to not highlight this 
dependence here to avoid cumbersome notation. 

3.2. Heuristic connection with work on a.s. convergence. Many results 
have been obtained concerning the almost sure (a.s.) convergence of different 
spectral characteristics of random covariance matrices, starting with the 
Marcenko-Pastur equation (see [27] and [40]). The article [3] contains a 
thorough review and a nice introduction to these problems. 

Of particular interest to us are results concerning the behavior of the 
largest eigenvalue in the case of non-Id covariance. Classical ([27], equation 
(1.15)) and more recent results (see, e.g., [4, 31]) emphasize the role of the 
function 

goc{m) = -— + - [ dHoo{X), 

where Hoc is the limiting spectral distribution of Sp, in obtaining almost 
sure convergence properties of li{X*X/n) and lp{X*X/n) and determining 
the limiting spectral distribution of X*Xjn. In particular, the points m 
where g'od'^) = 0 intuitively play a crucial role in determining its support. 
Note that doing asymptotic analysis at fixed spectral distribution and p/n 
would lead to considering the equivalent of goo where Hp replaces Hoc ■ 

Now proceeding formally, we see that f'{z) = —pn,p + gp{—z)., where g(p(m) = 
-^ + 17 T^dHp{X), and hence 

f"{z) = -g'p{-z). 

Since we are essentially interested in the points where (fp{z) = 0, the heuristic 
tells us that for a large class of Sp, the critical point of interest to us is going 
to be a triple point of the function / (i.e., a saddle point of order 2). 

3.3. Consequences: choice of c,iJ.n,p o-nd an,p. So it is now clear that the 
solutions of the equation 

are likely candidates to play a central role in the problem. 

Since we are focusing on largest eigenvalue problems, it is natural to 
consider for c the unique solution in [0, l/Ai(Sp)) of this equation. In other 
words, 

( 12 ) c = c(Sp,n,p),ce [0,l/Ai(Sp)):y(^Y3^) dHp{X) = ^ 
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will play a crucial role in our analysis. 

Note that, if a > 0 , the function x ^ ax/{I — ax) is continuous and 
(strictly) increasing on (0,1/a). Hence h{x) = /(Ax)^/(1 — Xx)‘^ dHp{X) is 
increasing on (0, l/Ai(Sp)). It is also strictly convex, as a convex combina¬ 
tion of strictly convex functions. Since h goes from 0 to oo on [0, l/Ai(Ep)), 
the equation h{x) = r has exactly one solution on [0, l/Ai(Ep)) for all r G M+. 
Existence and uniqueness of c{T,p,n,p) are therefore proved. 

For steepest descent reasons, we also naturally “require” that f'{c) = 0 
and hence 


d^n,p - c + n / 1 - Ac “ C + 


Ac 

1 — Ac 



Hindsight from the analysis (see Appendix A. 2 ) makes clear that if the 
arguments are to go through, we will have 

/»' ic) = 2aiy = |(l + r/(^) dif,(A)). 

While this discussion does not show anything, it provides heuristic reasons 
for the not necessarily intuitive choice of the parameters c, p and a. What 
is left to do is to find paths E and S on which we understand the behavior 
of f{z) and will allow us to show convergence of An^p and Bn,p to our target 
functions. Note that the paths T and H we will choose are functions of Sp, 
n and p. The fact that / is real for real z as well as geometric properties 
of saddle points of order 2 (see [28], page 137) makes natural the choice of 
lines crossing the real axis at angle vr/S and 27r/3 as starting points for T 
and H at c. 


3.4. About r. Because of a slight technical problem appearing in the 
operator convergence analysis, we will not exhibit T immediately but rather 
a r which is much more natural from the point of view of the analysis of the 
behavior of /. Specifically, we will exhibit a curve r_|_ on which f{z) is well 

understood. Then T will just be T = r+ur_|_, where the denotes complex 
conjugation. The problem is very graphical, so we will first show a drawing 
of r+. 

We will show the following lemma: 

Lemma 1. Under the assumptions of Theorem 1, 'iR-{f{z)) is decreasing 
for zGriUr 2 Ur 3 as K(z) increases. Also the length of is uniformly 
bounded. Finally, there exists Ri > 0 such that max^pri f (-z)) < ^(f(d)), 
where d = d(Sp, n,p) = c(l + 2(-l + l/(Aic))e“/3). 

Ri is uniform with respect to our models: given a family of models {(Ep, n, 
p )}^3 in Q, we get di = limsup Aic, 7 ^ = limsupn/p, = liminf Ap. Ri is 
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just a function of these parameters and not of the individual triplet (Sp, n,p) 
we will be dealing with. 

Precise definitions of Pj’s will be given as they arise in the analysis. In 
particular, that of r 2 requires a significant amonnt of notation and we choose 
to postpone it in the interest of clarity. Here is nonetheless a summary. 

We temporarily call a the real part of z. The problem of finding r_|_ is 
divided into four parts. First, when a < 1/Ai, we go along a line that makes 
an angle of 7r/3 with the real axis, starting at c. When 1/Ai <a< 1/Ap, we 
use a slightly more complicated path described in Section 3.4.2. When 1/Ap 
is crossed, we go along a line that is parallel to the real axis until reaching 
a value Ri. At this value Ri, we go down vertically to the real axis. 

Hence we will show that one can follow, even in the general case, a path 
that resembles that of Figure 4 in [6]. See Figure 2. There are two extra 
difficulties in the general case: we have to take care of an arbitrary spec¬ 
trum which significantly increases the technical problems. Also, crossing the 
1/eigenvalue zone is not a simple problem when first encountered. 

In all that follows, we will use the notation 

A-. Ax Ax 2A^ A 

cx — cA, (y.\ — cAi, (Xp — cAp, , /i — 

p 

We work under the assnmptions of Theorem 1 , hence 0<c<l/Ai,0<a<l, 
limsupai < oo and liminf Op > 0. Recall that 

f{z) = -fi{z -q)+ log{z) - ^ y log(l - ^A) dHp{\). 

3.4.1. Behavior on Ti. On Fi, we have z = c + 

We call t = xc and consider m{x) = 5P(/(c + Note that “x in¬ 

creases” is equivalent to “5P(2;) increases.” This reparametrization consider¬ 
ably simplifies the computations. We have 

m{x) = -|- x/2) — q/c] -|- ^ log(c^(l + x + x^)) 

— J log((l — of' — xa{\ — a) -I- oF'x^) dHp{X). 

Recall that we want to show that m![x) < 0, so that m decreases when we 
move along Fi with 5P(2:) (or equivalently x) increasing. We have 

X pc I 2x4-1 1 f 2a^x - a{l - a) 

2 21 + x + x^ 27^7 {1 — a)^ — xa{l — a) + a^x^ ^ 


Now remark that 
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and 


= 7^ + 


a 


a 


\ — a 
2 


dHp{\) 


+ 


a 


(1 — a)^ 1 — a 

+ a(l — a) 


dffp(A) 


(1-ay 


dffp(A} 


a 


(1-ay 


■dHp(X). 


Therefore, m'(x)2y‘^ is equal to 


9 2x H” 1 9 

+:r-^--^7 


2a^x — a(l — a) 


1 + X + x"^ 
a 


(1 — ay — xa(l — a) + a^x 


dHp(X) 


2^2 P 


+ 


2x + l 


a 


(1 — ay l + x + x‘^(l — ay 

2a^x — a(l — a) 

(1 — ay — xa(l — a) + oP'x^ 


dHp(X) 


a 


-1 + 


a(2x + l) (1 — ay (2ax — (1 — a)) 


(l — ayV l + x + x"^ (1 — ay — xa(l — a) + a'^x'^ _ 


dHp(X). 
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To simplify the problem, we note that the expression between the brackets 
can be written 

__ 

(1 + x + — aY — xa{l — a) + 

^_ 9(.x,a) _ 

(1 + X + x2)((l — aY — XQ!(1 — a) + a^x^) ’ 

A simple computation shows the following simplification: 

Co = 0, 

Cl = 0, 

C 2 = 2a(a — 1), 

C3 = q:(2q: — 1), 

C4 = —a^. 

Hence 

g(x, a) = —ax^(2(l — a) + (1 — 2 q!)x + ax^). 

We want g{x,a) to be negative, so we just have to study the polynomial 
P{x,a) = 2(1 — a) + (1 — 2a)x + ax^. Recall that x > 0. If a < 1/2, all the 
coefficients are positive so the polynomial is positive for all x £ M+. Now the 
roots, at a fixed, of P{-,a) are 

(2a - 1) ± v'(2a-l)^-8a(l-a) 

=-2^-• 

The polynomial under the square root can be rewritten h{a) = 12a^ — 12a + 
1. Its roots are 1/2 ± l/\/6, and it is negative between them. 

Therefore, if a < 1/2 + I/a/G ~ 0.9, P{x,a) > 0 for all X in M+. So we just 
need to focus on a’s such that a > 1/2 + l/\/6- 

Remark that x+ and x_ are both positive, because a > 1/2. We poten¬ 
tially have a problem (of sign) when crossing the smaller one of the two 
roots, which is of course x_. Now note that x_(a) > x_(ai). 

As a matter of fact, we remark that x_ (a) < 1 for all a’s under considera¬ 
tion. Then, for u < 1, P'(n,ai) < P'{u,a), since P'{u,a) = l-|-2a(u— 1) and 
u < 1. Note also that P(0,ai) < P(0,a) = 2(1 — a). So P{u,ai) < P{u,a) 
for u < 1. Since P(x_(ai),ai) = 0, we see that x_(ai) < x_(a), if a > ai. 
So for u < x_(ai), P{u, a) > 0. 

Now the only thing we need to verify to make sure that we can reach 
3f?(z) = 1/Ai is that x_(ai) > 2(^ — 1). Given that ai > 0.5 -|- l/\/6, this 
is equivalent to showing that y^(2ai — 1)^ — 8ai(l — ai) < 6ai — 5, which is 
equivalent to 0 < 24(1 — ai)^. 
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So we have shown that ?R.{f{z)) decreases when the real part of z increases, 
when going along the line intersecting the real axis at c and making an angle 
of 7r/3 with it. If ai < 0.5 + l/\/6, we can cross the whole plane along this 
line and continues to decrease. If ai > 0.5 +1/\/6, we are guaranteed 

that the property holds until 5R(2;) = 1/Ai. 

Hence the claim we made about Fi being a descent path of ^{f) is verified. 


3.4.2. Behavior on T 2 . As we saw in the previous subsection, this is only 
a concern if a\ > 0.5 + l/\/6. So we suppose we are in this situation. Before 
we proceed to exhibiting a path, we perform a preliminary computation that 
will prove useful in both this subsection and the next one. 


Independent computation. Suppose we write z = c{u + iv). We have 
= -pc{u - q/c) + ^ log(c^(M^ + u^)) 

1 


- J log((l - dHp{X). 


27 

If we consider that v = v{u), we have ifi{f{z)) = g{u). The question of finding 
a path along which H^f^z)) decreases when increases is equivalent to 
finding v{u) such that g'{u) < 0. With this in mind, we observe that 


, I2u + 2W 

g [u) = -gc+- 

2 + 


1 


2a(Q;u — 1) + 2a^vv' 


dHp{\). 


(1 — auY + a^v'^ 

Let us call I{u) = and (5 = u + vv'. Using the fact that = 

/ q;/( 1 — a)^ dHp{\) and 7 ^ = / oP'/ (1 — a)^ dHp{X), we get 


(13) j^g\u)=J- 


a 


— 1 + Q.- 


(5 (a/3 — 1)(1 — a) 


2 n 


dHp{X). 


(1 — a)2 [ I{u) {1 — auP + 

Back to the topic ofT 2 . When 1/Ai < iR.{z) < 1/Ap, we have 1/ai <u< 
l/oip. In this part of the plane, we propose to choose /3 = I{u). Then the 
expression inside the brackets in equation (13) becomes 

{al{u) — 1)(1 — a)^ 


—1 + a — 


a^I{u) — 2au + 1 


= (a 
= (a 
= (a 
= (a 


1 ) 

1 ) 


al{u) — 1 


l-(a-l)- , 

^ a^I{u) — 2au + 1_ 

a^I{u) — 2au + 1 — a^I{u) + a + al{u) — 1 
a^I{u) — 2au + 1 


l)a 

l)a 


I{u) — 2u + 1 
a^I{u) — 2au + 1 

{u — 1)^ + 

(1 — au)^ + 


< 0 . 
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Note that at the end of Fi we arrived at ui = l/ai and the corresponding 
V was vi = ■ Now the choice of /3 = I{u) can be reformulated as 

I'{u) = 21 (u) and hence I{u) = Simple algebra shows that finally 

/(u)=(^) (1 + 3(1 - on r 2 . 

Note also that since u > 1, I{u) = u'^ + v'^ > u and hence P = u + vv' = I{u) 
implies that > 0, as we started with ui > 0. So we will not cross the real 
axis by following this path. In the original coordinates, if we call z = a-\-ib, 
the path is such that 

62 = (1 + 3(1 _ ^^) 2 )g 2 (a-l/A 0 /c _ 

with 6 > 0. For F 2 , we follow this path until we reach a = 1/Ap. Note that 
with our assumptions about 7 , limsupai and liminfctp, the length of this 
path is uniformly bounded. We also remark that if ap —> 0, the length of F 2 
grows to 00 , which causes problem for the control of the operator later on. 

3.4.3. Behavior on F 3 . We revert to the notation z = c{u + iv). The 
point is just to show that with v' = Q when u > l/op, ifi{f{z)) decreases. 
If we recall (13), we realize that if a/3 < I{u) and a/3 > 1, then g'{u) < 0. 
But when v' = 0, /3 = u. Now, if tt > 1/ap, au = a/3 > ap/3 > 1. Also, since 
u < I{u), and a < 1, au< I{u). Hence iR.{f{z)) is decreasing when moving 
along F 3 . 

3.4.4. Behavior 0 /F 4 . There, z = Ri +iy, where, with a slight abuse of 
notation, 0 < y < F 2 (l/ap). Now 

+ \ ^og{Rj + 2 /^) 

- ^ I log((Ai/i - 1)2 + XY) dHpiX) 

so, since g is bounded away from 0 , if Ri 00 , iR.{f{z)) — 00 , and we can 

pick Ri so that, uniformly for our models. 

This is a simple consequence of the fact that iR.{f{d{'Ep,n,p))) is bounded 
below under the assumptions of Theorem 1 . (See Appendix A. 1 . 1 .) 

There is of course a problem of definition of / at y = 0, because the 
argument of the logarithm is real and negative. Nevertheless the function 
h{z) = 01 = 1(1 “ zXk)~^ is well defined and well behaved at 

z = Ri, so this definition problem will cause no harm in the analysis of the 
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convergence of the operators. As a matter of fact, it turns out that we will 
just be interested in bounding \h{z)\. Since we can take the log of \h{z)\ 
without any problems and it leads to the same expression as the one for 
3f?(/(z)) we considered, we can safely ignore the definition of / problem for 
all practical purposes. 

3.5. About'B. We use the same conventions as when we studied r. Namely, 
we will study the behavior of / on S, but we will first exhibit H, with 

H = U H_|_. It turns out that the analysis is much simpler for this contour 
and we will be able to follow the path used in [6], after doing some precise 
technical work. 

Once again, the problem is very graphical. A drawing of H_|_ is shown in 
Figure 3. 

What we will have to do in this case is to show that ?R.{—f{z)) is decreasing 
when we travel along Hi and H 2 , and is decreasing. 

This time Hi is defined as a line making an angle of 27r/3 with the real 
axis and crossing it at c. H 2 is a line that runs parallel to the real axis, in 
the direction of — 00 . 

The aim of this subsection is to show the following lemma: 
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Lemma 2. Under the assumptions of Theorem 1, 3^(—/(z)) is deereas¬ 
ing for z G Hi U S 2 as 5R(2:) decreases. Also, the length of H_|_ is uniformly 
bounded. Finally, there exists > 0 such that max^gSg iR.{—f{z)) < 5R(—/(e)), 
where e = e{Tip,n,p) = icy/S. 


3.5.1. Case of Si. Once again, we will consider everything on the c scale. 
We define Hi as z = c + We have 

</i(x)^3f?(-/(c(l + xe*2-/3))) 

= 3f?(—/(c(l — x/2 + ix\/3/2))) 

= //c(l — x/2) — i log(l — X + x^) 


+ 

+ 


1 

V 
1 / 1 


J log((l — a(l — x/2))‘^ + Sa^x^/A) dHp{X) 
-1^ log(c^) + ^g. 


Hence, we get 


2 if / \ 1 2 9 1 

1 (l>i{x) = --uci -1 ^ 

2 2(1 —x + x^) 




2a^x + 0!(1 — a) 


■dHp{X). 


(1 — a)^ + xa(l — a) + 

Therefore, using the same equalities we used when studying T, we have 


2'y^(l)[{x) = J 


—a 


2x-l 


a 


(1 — a)2 1 —x + x2(l —a)2 

2a^x + a(l — a) 


+ 


a^x^ + a(l — Q!)x + (1 — o;)2 


dHp{X) 


a 


(l-a)2 


— 1 — a 


2x-l 


1 — X + x^ 

(2ax + (1 — a))(l — a)^ 


a^x^ + a(l — a)x + (1 — q;)2 J ^ 

As before, the expression that is within the parentheses can be written 

_ELo Ckx'^ _ 

((1 — x/2)2 + IA){o?x^ + a(l — a)x + (1 — a)^) 

and we know that the denominator is positive. A simple computation leads 
to 


Co = 0, 
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Cl = 0, 

C 2 — —2ci T ‘2cx , 

C3 = a — ‘lo? ^ 

C4 = — 

and hence the numerator is 
4 

CkX^ = —x^Q;(Q;a:^ + (2a — l)x + 2(1 — a)). 

k=o 

The same questions we asked when dealing with Ti now come up. Note that 
our X is positive, so if a > 1/2, P{x,a) = (ax^ + (2a — l)x + 2(1 — a)) > 0. 
Also, at a fixed the roots of P{-,a) are 

1 - 2a ± Vl2a2 - 12a + 1 

As we saw before, we therefore have P{x,a) > 0 on M+ x [1/2 — 1]. 

Now if a < 1/2 — l/\/6, we have to work a little harder. We remark that if 
a G [0,0.5 — l/\/6], it is easy to check that x_(a) > 2. Hence we conclude 
that 


P{x, a) > 0 on [0,2] x [0,1]. 

Now z{2) = 0 + ic\/3 = e. So we have shown that 3f?(—/(z)) decreases when 
we travel from c to e along Hi. 


3.5.2. Case 0 /H 2 . On this part of the path we use the parametrization 
2 ; = —XC + i\pic^ with X > 0 and increasing. We have, if AT is a constant (at 
Sp, n and p fixed), 

= ATC + ^ log(x^ + 3) - J log((l + ax)^ + 3a^) dHp{\) + K. 
Calling (l) 2 {x) = 3f?(—/(z)), we hence get 


1 If 

4>2{x) = -pxc - - log(x^ + 3) + / log((l + Oixf + 3q;^) dHp{\) + K. 


27- 

Using the same approach as before we hnd that 


77jW = /7y 


a 


(1-a)^ 


(1 + ax)(l — a^)'' 
x^ + 3 ' (l + ax)2 + 3a2 


—1 — a 


+ 


dHp{X). 


Once again what matters to us is the numerator of what is within the 
bracket. It is a polynomial—let us denote it by Q{x,a )—of degree 4 in 
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X, its coefficients being 


Co = —6 q;(1 — a), 

Cl = —2a{2 + 3a), 

C 2 = -a{2 + 7a), 

C 3 = -a{l + 2a), 

C4 = — 

Hence it is clear that Q(x,a) < 0 on M_|_ x [0,1]. Therefore, we have shown 
that ^(—f(z)) decreases as 5 R( 2 ;) decreases and z travels on H 2 . 

3.5.3. Case of H 3 . Here 2 : = —R 2 + iy, where 0 <y < c\/3. It is easy to 
see that ^fi{—f{z)) can be made as small as we want, since // is bounded 
away from 0. We show in Appendix A.1.2 that ?R.{—f{e)) is bounded. In 
particular, if we choose R 2 large enough, 

max 3 ?(-/( 2 ;)) < 3f?(-/(e)). 

2e=3 

This holds uniformly with respect to our covariance models, if they are in Q. 


3.6. Study of An,p- We give an outline of the key ideas and results that 
allow us to then proceed to operator convergence issues. The proof is given 
in Appendix B. 


3.6.1. Definition of q and modification of T to get T. At this point we 
still have not set q and it is now time to do it. Let us pick an e > 0. Then, 
set 

(14) q = q{T.p,n,p) = c - 

n,p 

Then, as in [ 6 ], we just have to modify the curve r_|_ around c to obtain r+. 
r+ is the same as r+, except it starts by 


hn — “I c + 


je 


2n(T. 


n,p 


: — < 6 < TT 

3 - - 


When To reaches Ti, we follow Ti, and then follow r 2 , Ts and r 4 to create 
r_|_. Then r = r_|_ur_|_. Of course, in the end, the contour T is oriented 
counterclockwise. A depiction of r+ can be found in Figure 4. 
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3.6.2. Arguments needed for the operator analysis to go through. The 
method of proof is similar to that of [6] , once the difficulties stemming from 
the fact that we are considering a much more general case are understood. 

The issue we will face is to find a sequence Kn,p such that Kn,pAn^p —> 
e~^^Ai{x) and Hn,pAn,p goes to zero exponentially fast at infinity. 

The analysis will rely on four key points. They are: 

1. The length of T is uniformly bounded with respect to our models. We 
will justify in Appendix B.l why this is the case in the situation we are 
considering. 

2 . One needs to be able to find h > 0 such that 



5 has of course to be uniform with respect to our models. 

3. We also need 



4. Finally, 6 has to be chosen small enough that the disc of center c and 
radius 5 should encompass neither d{T,p,n,p) nor e{T,p,n,p). 

We will explain in Appendix B.l why these conditions are fulfilled under the 
assumptions of Theorem 1 and then prove in Appendix B.2 the following 
proposition: 


A 





1 


r, 


r 




1/A, 




R 


Fig. 4. The curve r+. 
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Fig. 5. The curve —+. 


Proposition 1. In the definition [see f9j] of An^p, let Hn,p be equal to 
fj, in (2) and an,p = with a defined in (3). When the four conditions 

above are fulfilled, we have 

Vso G K, 3C(so) G M+ and Nq G'N such that 

K,pAnAs) - e—Ai(s)| < 

If s> sq and n> Nq. Here K^^p = 

As a function of sq, C can be chosen to be continuous and nonincreasing. 

3.7. Study of Bn,p. Here also, H needs to be modified. We start by Hq, 
an arc of a circle centered at c and with radius 3e/{nan,p)- Formally, Sq = 
c + 3e/{nan,p)e'^^'^~^\ with 0 < 0 < tt/S. When Hq intersects Hi, we follow 
Hi, and so on. A depiction of H+ can be found in Figure 5. 

We then have: 

Proposition 2. In the definition [see (W)] of Bn,p, let fin,p be equal to 
hi in (2) and an,p = with a defined in (3). When the four conditions 
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in Section 3.6.2 are fulfilled, 

Vso £ 1^) 3(7(50) £ A^o S N such that 

\Bn,p{s)/^^n,p - e^*Ai(5)| < 

if s> So and n > Nq. Here Kn,p = e~^B<^) ^ tfig same as in Proposition 1. 

As a function of sq, C can be chosen to he continuous and nonincreasing. 

Explanations are postponed to Appendix B.3. 

3.8. Operator convergence issues. We will mostly rely on two key prop¬ 
erties in this subsection: the relationship between trace class and Hilbert- 
Schmidt norms and the fact that the determinant det(/ — •) of trace class 
operators is a locally Lipschitz function with respect to trace class norm. 

More precisely, recall (see [16], Section IV.7) that if O and P are Hilbert- 
Schmidt operators, then OP is a trace class operator and 

||0P||l<||0||2||P||2. 

Also, it is well known (see [16], Theorem IV.5.2, and Theorem II.4.1 and 
Corollary II.4.2 both due to Seiler-Simon) that if Q and R are trace class 
operators, 

(Lip) I det(/ + Q) - det(/ + i?)| < ||Q - 

This section is now devoted to proving two lemmas that allow us to prove 
Theorem 1. 

Let us call E the multiplication operator by e~^ and Ai* the operator on 
L^([s,oo)) with kernel Ai{x+ y — s). 

Lemma 3. Using the conclusions of Propositions 1 and 2, we have, if 
we view all the operators as operators on L^([s, cx))).- 

Vso £ 1^) 3il G M+ and Vq G N such that 
\\An,pBn,p - EPdlE-^Wi < 

if s> So and n > No. C, as a function of so, can be chosen to he continuous 
and nonincreasing. 

Proof. Recall the following fact: according to [28], page 394, for x > 0, 
Ai(x) < exp(—2x^/^/3)/(27r^/^x^/^). Hence it is clear that the operator P 
with kernel P{x, y) = P{x + y — s) = Ai(x-|-y — s) exp(e(x-|-y — s)) is Hilbert- 
Schmidt on L^([s,oo)), and similarly for O that has kernel 0{x,y) = 0{x + 
y - s) = Ai(x + y - s) exp(-e(x + y - s)). 
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More precisely, since these kernels are as functions of {x,y) square inte- 
grable on [s,oo) x [s,oo), Theorem VI.23 in [30] applies and we see that, for 
instance, if we view O as an operator on T^([s, CX))), 


\\o\\l=ff ^ 

J -/[s,Oo)2 


{0{x + y — s)) dxdy 


[0,c»)2 


{0{x + y + s)) dxdy 


= / {0{x + y)) dxdy. 

Jx=sJy=0 

It is clear that this is a continuous, nonincreasing function of s having limit 
0 at oo. The same analysis and conclusion apply to P. 

Now let us denote An^p = Hn,pAn,p and Bn,p = Bn,p/Kn,p- Trom the previ¬ 
ous analyses we conclude that we can find a continuous, nonincreasing func¬ 
tion C such that, if we view all the operators as operators on L^([s,oo)), 
with s > sq, ll^n.plb < C'(so)! ||.P ||2 < C'('So)) and similarly for B^^p and O, 
as long as n> Nq{so). For instance, C{s) could be 2(||0||2(s) -|- ||P|| 2 (s)), 
where we have highlighted the dependence of the Hilbert-Schmidt norm of 
O and P on s. 

Since A^i pB^i^p OP A.fi^p(^Bfi^p T) -t- (^A^i^p O^P^ we have 

\\An,pBn,p — O.P||l < ||^n,p||2||.B„^p — P||2 -|- ||^n,p ~ 0||2||T’||2- 

Using the estimates obtained in Propositions 1 and 2, we have shown that 
if we view An^pBn,p — OP as an operator on L^([s,oo)) with s > sq, we have 


\\An,pBn,p-OP\\i < 


C(so)exp(-es/2) 


n 


1/3 


if n > Nq(sq), for yet another continuous, nonincreasing function C. Finally, 
OP and EPA^E~^ have the same kernel, so Lemma 3 is proved. □ 


Let us call Fq the cumulative distribution function of the Tracy-Widom 
distribution arising in the study of the Gaussian unitary ensemble. Recall 
that, as explained in [35], formula (4.5) and page 166, Fo{s) = det(/ — Ai^), 
where Ai^ is viewed as an operator on L^([s,oo)). Note that since EAis and 
AisFi”^ are clearly Hilbert-Schmidt on L^([s,oo)), det(/ — EAisAisE~^) = 
det(/ - ALFi-iSAL) = det(/ - Ai^). 

We also have 

^ ~ 

The continuity of the determinant det(/ — •) with respect to trace class 
norm implies that 


P 


h-ny \ 

-7 




0 . 
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The convergence part of Theorem 1 is therefore proved. 

We now turn to proving the rate of convergence part of Theorem 1. In 
other words, we want to show that: 

We can find a function C (continuous and nonincreasing if we wish) such 
that Vso, 


s> sq and n > Nq implies 


P 


li — n/i 





< 




n 


1/3 


Proof. First, it is clear that since, in the notation of the previous 
proof, An,p and Bn,p are Hilbert-Schmidt operators and converge to, re¬ 
spectively, O and P, we have, when considering our operators as operators 
on T^([s,oo)), 

Vs > So and n large enough < ||A„,p|| 2 ||.Bn,p ||2 < 2||0||2||P||2- 

This last quantity is less than C'(s), where (7 is a continuous, nonincreasing 
function, going to 0 when s tends to oo. 

Hence, for s > sq, if n > iVo('So), + ||OP||i < 3C'(so), for yet 

another continuous, nonincreasing function C. In view of equation (Lip) 
and the estimate we already have for \\An^pBn,p — OP\\i, the statement is 
shown, because |P((^i — np)/(n^^^a) < s) — Fb('S)| = |det(/ — An^pBn^p) — 
det(/-OP)|. 

Since the C’s appearing in Propositions I and 2 may depend on the models 
under consideration, so may C. □ 

Hence the rate of convergence part of Theorem 1 is proved. 

4. Simulations, related issues and conclusion. We will discuss in this 
section some practical consequences of Theorem 1 as well as some of the 
questions it raises. To simplify the discussion, we recall that we denote by Q 
the class of (covariance) models for which Theorem 1 applies. We will often 
abuse the notation and say that a covariance matrix is in Q to mean that 
the corresponding model is in 

4.1. Finite perturbation of a covariance matrix that is inQ. In this sub¬ 
section, we discuss some immediate consequences of the analysis we made 
to the case of a finite perturbation of a covariance matrix that is in Q. 
By this we mean that we are now considering data matrices X that are 
n X (p + k), and Xi A/c(0, Sp+fc), where k(p) < K, K G N, and we add 
to {Ai(Ep),..., Ap(Sp)} k eigenvalues larger than Ai(Bp). In other words, 
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Afc+i(Sp_|_fc) = Ai(Sp). This is of course a generalization of spiked covariance 
models considered in [6], where the bulk covariance is not restricted to be 
Id but rather a matrix for which Theorem 1 applies. 

We will di^uss two cases. First, we will assume that there exists A > 0 
such that Ai(Sp+fc) < —x + l/c(Sp,n,p). We will see in that case that Theo¬ 
rem 1 applies. Then we will discuss the case where Ai(Sp_|_fc) = l/c(Sp,n,p) 
and the situation when the multiplicity of this eigenvalue is ko. 

Fact 1. In the spiked situation described above, if there exists x > 0 
such that 

Ai(Sp+fc) < -x + l/c(Sp,re,p), 

Theorem 1 applies to {(Sp+fc,n,p + A;)}. 

The proof is elementary and is given in Appendix A.4.1. Intuitively this 
means that if we perturb a model for which Theorem 1 applies by adding 
a few leading eigenvalues that are not too large [and too large means larger 
than l/c(Sp,re,p) — x for some x > 0], then Theorem 1 applies to the per¬ 
turbed model. 

In light of [6], another natural question is to understand what happens 
when we spike the model by adding k eigenvalues at exactly I/c(Sp,re,p). 
We have the following result in this case: 

Theorem 3. Let us assume that a model in Q is spiked by adding k 
eigenvalues at 

(Alp+fc) ■ ■ ■ A/j (Sp_|_^) l/c(Sp, n,pf 

The value ofk is fixed and is not allowed to change with n or p. Then calling 
Ffs the distribution functions defined in Definition 1.1 of [6], we have 

As in Theorem 1, we have 





Note that c, and a refer to the nonspiked model. 

A justification is given in Appendix B. So we have extended Theorem 
1.1(a) in [6] to models in the class G. More information about the F^’s can 
be found in [5], [6] and Appendix B.1.4. 
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4.2. Statistical considerations. 

4.2.1. Isolated largest eigenvalue vs. largest eigenvalue with a small mass. 

One of the many very interesting results obtained in [6] was their Theorem 
1.1(b). It basically says that if an Id matrix is spiked with eigenvalues that 
are larger than l/c{'Sp,n,p) + X^ X> h has a completely different type of 
limiting distribution, and that centering and scaling should be changed. In 
particular the scaling should be adjusted from to The question 

of knowing if and how this happens for matrices of the class G is currently 
under investigation by the author of this article. As an aside, let us remark 
that is the rate obtained through elementary concentration of measure 
arguments. We refer to Appendix A.5 and references therein for more details. 

Let us go back to our discussion and call this large spike Ai. If instead of 
changing one eigenvalue we had a small mass ^(p) [with liminf i/(p) > 0] at 
Ai, then Theorem 1 would apply. Hence the centering, scaling and limiting 
distribution of li would differ drastically from the case where Ai is isolated. 
In practice (and in statistical applications), one cannot tell from the data 
if there is one eigenvalue (out of say 100) that is much larger than the rest 
of them, or if 1% of the eigenvalues are clearly separated from the bulk. 
One will therefore have to specify precisely what models are considered if 
the results presented in this paper and those in [6] are used for statistical 
inference. Note that asymptotics done at fixed spectral distribution lead to 
Tracy-Widom limits. 

For instance, in a hypothesis testing context, the power of tests based 
on these “large p, large n” asymptotics will depend greatly on the specified 
alternatives. 

4.2.2. Classical asymptotics or limn/p < oo asymptotics? An interest¬ 
ing statistical aspect of Theorem I is that we see, in p, the effect the whole 
spectrum of the covariance matrix has on the largest eigenvalue of the em¬ 
pirical covariance matrix. This is very different from the classical situation 
(i.e., p fixed and n goes to oo) where (at least in the real case and when all 
the eigenvalues of Sp have multiplicity I) 



(See [2], Theorem 13.5.1.) 

In other words, in the classical case, a test based on the largest eigenvalue 
of the empirical covariance matrix is not sensitive to the whole covariance 
structure but just to the value of the true largest eigenvalue. Under the 
asymptotics we are considering, such a test does—implicitly—take into ac¬ 
count the whole structure of the spectrum. This is of course very interesting, 
for instance, for tests of sphericity. 
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4.2.3. On Theorem 2 and other random matrices of interest. The joint 
distribution of the eigenvalues of other random matrices with complex Gaus¬ 
sian entries is also known. A good reference is, for instance, [23], Section 8. 
Note that they all involve so-called hyper geometric functions of two matrix 
arguments. An interesting characteristic of these functions (which since we 
are dealing with complex entries have to do with the unitary group) is that 
they have representations in terms of determinants. We refer to, for instance. 
Section 4 in [20] for explanations and in particular to their Theorem 4.2. 

The Harish-Chandra-Itzykson-Zuber formula, which is a preliminary to 
the proof of Theorem 2, is a subcase of Theorem 4.2 of [20], specialized to 
the case of the exponential function. A natural question is therefore to know 
whether one can obtain the same type of representation as the one obtained 
by Baik-Ben Arous-Johansson-Peche in Theorem 2 in the case of the more 
general distributions described in [23], Section 8. 

In other respects, let us also note the interesting recent developments 
found in [9] and [15] concerning problems that are close to the one we studied. 
For more statistical considerations, in the case of spiked models, see [29]. 

4.3. Concluding remarks. The problem of convergence of the joint distri¬ 
bution of the /c-largest eigenvalues of X*X requires other tools than the one 
we discussed in the main body of the paper. We therefore refer the reader to 
Appendix A.6 for the proof of Corollary 2. In this subsection, we will keep 
discussing some properties of the largest eigenvalue of X*X. 


4.3.1. Convergence in probability and a.s. convergence. In this part of 
the text only, we highlight the fact that /x depends on Ep, n and p by 
calling it fj.{T,p,n,p). Using Slutsky’s lemma, it is clear that in the setting 
of Theorem 1 or 3, for models in Q, 


h 

n 


/x(Sp,n,p) ^0 


in probability. 


Since fi{T,p,n,p) > l/c(Ep,n,p) and limsupAic< 1, we see that h/n is al¬ 
ways an inconsistent estimator of Ai for models in the class Q. Note that 
Theorem 1 allows us to quantify {h/n) — Ai and explore how this quantity is 
affected by changes in Sp, n and p. In particular, elementary computations 
show that, at Ep fixed, {h/n) — Ai is, unsurprisingly, a decreasing function of 
n/p. We explain in Appendix A.S that, as announced in Corollary 1, through 
Theorem 1 or 3 and concentration of measure arguments, we can show that, 
when the theorems apply, 


h 

n 


/i(Ep,n,p) ^0 


In other respects, we have the following fact. 


a.s. 
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Fact 2. Let {F^j} be i.i.d. random variables, real or complex, with 

E{Yij) = 0, E(|yjjp) = 1 and E(|yjj|^) < oo. Let the n x p matrix X be 
1 /2 

such that X = YTip , where Y is an nx p matrix whose entries are the Yij. 
Suppose the model {(Sp,n,p)} is in Q and moreover Hp =^> Hoo, n/p^ p 
and Ai(Sp) —> sup supportH^. Then 


h 

n 


p{Yp,n,p)^0 


a.s., 


where p{T,p,n,p) is defined in (2). 


It is a simple consequence of Theorem 1.1 and its corollary in [4], once 
we realize that all the limiting quantities involved in that statement are 
independent of the distributional assumptions made on the Yi’s. Hence the 
limit in the case of complex Wishart matrices is the same as the limit in 
the “general” situation. In particular, this covers the case of real Wishart 
matrices, that is, data matrices with real normal entries. 


4.3.2. Some simulations. It was remarked in [25] that the quality of the 
Tracy-Widom approximation to the marginal distribution of li is very good, 
especially in the right tail of the distribution. This is one of the remarkable 
properties of this approximation. We refer to [25], Table 1, page 302 for 
examples. As an aside, we note that the simulation mentioned there was not 
done with complex Wishart matrices, but rather with real random variables. 
Nevertheless the same observations hold in the case of complex Wishart 
matrices with Id covariance. We refer to [II] for theoretical considerations 
that help understand why this is happening and some simulations in the 
complex Wishart case. 

We made a few simulations to show that the same phenomenon seems 
to occur in the more involved setting we treat in this paper. Note that 
numerically solving (1), (2), (3) and getting approximations for c, p and a 
takes a fraction of a second on modern computers. We present some results 
of our experiments in this discussion. See Tables 1 and 2. 

We also did some simulations with real Wishart matrices instead of com¬ 
plex ones. In the setting of Theorem 1, we obtained a very reasonable agree¬ 
ment between the empirical distribution of li{X'X) and a Tracy-Widom ap¬ 
proximation, this time using the Tracy-Widom law appearing in the study 
of GOE, but keeping the c, p and a obtained in Theorem 1. 

We would finally like to point out that Theorem 1 is essentially explicit 
if one has access to a computer. Then the eigenvalues of Tip are numerically 
computable and so are c, p and a. This is of course a very important property 
for the relevance of the theorem in applications. 
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Table 1 

Toeplitz covariance matrix example 


TW quantiles 

TW 

100 X 50 

400 X 50 

2* SE 

-3.73 

0.01 

0.004 

0.007 

0.002 

-3.20 

0.05 

0.033 

0.041 

0.004 

-2.90 

0.10 

0.072 

0.089 

0.006 

-2.27 

0.30 

0.269 

0.292 

0.009 

-1.81 

0.50 

0.479 

0.497 

0.010 

-1.33 

0.70 

0.691 

0.702 

0.009 

-0.60 

0.90 

0.901 

0.908 

0.006 

-0.23 

0.95 

0.953 

0.956 

0.004 

0.48 

0.99 

0.991 

0.992 

0.002 


The simulation mechanism was as follows. We generated 10,000 ran¬ 
dom matrices X of size n x p (using Matlab). The rows of these 
matrices were i.i.d. A/c(0,Ep). For each individual X, we computed 
1\{X*X)/n and recentered and rescaled it according to Theorem 1. Af¬ 
ter simulating 10,000 times we obtained an empirical distribution F for 
{h — . The columns of the matrix show the value of F at the 

quantiles of the Tracy-Widom distribution (courtesy of Professor Iain 
Johnstone), given in the leftmost column. If the approximation were 
“perfect,” the third and fourth columns would be equal to the second 
one. 

Here we picked Ep = Toeplitz(l, 0.2,0.3), p = 50. For the first column, 
n = 100, fi = 3.7297, a — 3.9271. For the second column, n = 400, p. — 
2.6559, (7 = 4.4288. 


Table 2 

Sum of atoms example 


TW quantiles 

TW 

100 X 50 

400 X 50 

2* SE 

-3.73 

0.01 

0.006 

0.008 

0.002 

-3.20 

0.05 

0.036 

0.045 

0.004 

-2.90 

0.10 

0.079 

0.092 

0.006 

-2.27 

0.30 

0.283 

0.292 

0.009 

-1.81 

0.50 

0.490 

0.496 

0.010 

-1.33 

0.70 

0.700 

0.697 

0.009 

-0.60 

0.90 

0.896 

0.902 

0.006 

-0.23 

0.95 

0.949 

0.951 

0.004 

0.48 

0.99 

0.991 

0.992 

0.002 


The simulation mechanism is similar to the one described previously. 
We again did 10,000 repetitions of the experiment. 

Here p = 100. Ep has Ai = • • • = A 30 = 10 and A 31 = • • • = Aioo = 1- 
In the case n = 100, p — 24.703 and a — 21.871. In the case n = 400, 
At = 16.417 and a = 21.257. 
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APPENDIX A 

A.l. Uniform control of 3fl(/(d(Sp, n,p))) and 3f?( —/(e(Sp, n,p))). 

A.1.1. Case of 5P(/((i(Sp,n,p))). Recall that we want to show that 
^{f{d{T,p,n,p))) is bounded below so as to guarantee that Ri, which ap¬ 
pears in Lemma 1, is uniformly bounded. In the notation of Section 3.4.1 
this is equivalent to showing that 

m( 2 (l/ai — 1 )) is bounded below. 

We clearly have 

m( 2 (l/ai - 1 )) 

> -/ic(l/ai - q/c) 

~ ^ J “ 2 (l/ai - l)a(l - a) 

+ a24(i/a^ _ l)^)dHp{X). 

It is clear that ((1 — a)^ — 2(1 /q!i — l)a(l — a) + a^A{l/ai — 1)^) < 1 + 
af4(l/ai — 1)^. Note that this quantity is bounded. Note also that the same 
is true of l/y^, // (because limsupai < 1 ), and hence —pcai is bounded 
below. All these arguments together show that m(2(l/ai — 1)) is bounded 
below and we have the control we need. 

A.1.2. Case of'R{—f {e{Yip,n,p))). We now want to show that the quan¬ 
tity 5P(—/(e(Sp,n,p))) is bounded below so that i ?2 (see Lemma 2) is 
bounded. In the notation of Section 3.5.1, we need to show is bounded 
below. This quantity is equal to 

</>i (2) = log(3) + ^ J dHp{X) + ^ - l) log(c2) -L pq. 

Now |log(c^)| is bounded, since c is bounded away from 0 (see Appendix 
B.1.1) and c< 1/Ai < 1/Ap and we assume that liminfAp > 0. Therefore, 
01 ( 2 ) is bounded below and the needed control is shown. 

A.2. About scaling and its connection to having a saddle point of 
order 2. We want to stress that is the “natural” rate for convergence 
to Tracy-Widom limits, as there is a connection between Airy functions and 
saddle points of order 2. The few lines that follow are the natural heuristic 
explanations of steepest descent analysis. Similar arguments are given after 
( 112 ) in [ 6 ] but we thought it was important to mention them again (and 
highlight the key parts) since they intuitively explain the connection between 
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having f"{c) = 0, an scaling in Theorem 1 and a Tracy-Widom limit. 
In another context, the same connections were observed in [17]. 

Recall that 


P 

^ — ^^n,p / ^-nan,px(z-q)^-nfj.n,p{z-q) 

27ri 7r 


1 


1 - zXk 


dz 


— "^’P / Q-n(r„,px(z-q)^nf{z) 

2m Jr 


Now because f'{c) = f"{c) = 0, we have around c, f{z) ~ /(c) + (c)(z — 

c)^/6. The point of the steepest descent analysis is to show that we then 
have (rigorously and up to precision we control) 


A (x) ZZ fp-'n-<^^,px{z-q) n{f{c)+A^'>{c){z-cf/&)^ 

' 2m Jr 

Since we picked f^^\c) = 2a^ p'nJ, we have 


^-nan,px{z-q) n{f{c)+f<~^'l (c){z-cf /6) 


-na„„x{c-q) nf(c) ( f ^ ^ ^n,p{z-Cr\ 

— g n,p \ Hjg ^ ex.p[—xnan,p[z — c)-\ - ^ - I. 

A key point is that the Airy function can be written for an appropriately 
chosen contour L (see, e.g., [28], page 53): 


Ai(x) =- [ exp ( —XV -) dv. 

2m Jc \ 3 ' 


So the change of variable a = t{z) = nan,p{z — c) becomes natural and our 
integral can be rewritten as 

An,p{x) ~ exp(^-xa + y) da. 


Picking q = c — — as in (14), we finally see that 

An,p{x)z ^—e y ^^^exp(^-xa+yj da, 

and the problem is finally to pick a “good” T on which to analyze / and such 
that r(r) is an appropriate path from the point of view of the definition of 
the Airy function. What is on the right-hand side now looks very much like 
Ai{x)/Kn,p in the notation of Proposition 1. (The minus is of course not 
a problem since it is an artifact of the orientation of our contours.) 
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A.3. Proof of Corollary 3 and examples of models belonging to In this 

subsection, we show that under assumptions that are both reasonable from 
an applications standpoint and relatively easy to check, Theorem 1 holds. As 
in Theorem 1 we assume that 1 < n/p, limsupn/p < oo, hmsupAi(Sp) < oo 
and liminf Ap(Sp) > 0. As seen in Appendix B.1.1, these three assumptions 
imply that liminf c > 0 and liminf XpC > 0. Our only problem will therefore 
be to check that 


limsup Aic < 1 . 

We will use the notation a = Ac, ai = Aic and 7 ^ = n/p. 

We consider covariance matrices Sp with spectral distribution Hp. We 
will treat two cases: when Hp has an atom of mass ^{p) at Ai, and the 
case where Hp weakly converges to a limit and the endpoints of its support 
converge to the endpoints of the limiting support. 


A.3.1. Case of Hp having an atom of mass v{p) at Ai. We assume that 
liminfz/(p) > 0. Note that Ai(Sp) can vary in the analysis that follows. It 
just needs to be bounded. Since 


7 


2 


(1 — a)^ 



simple algebra shows that 


cci < 


1 

\A^/7+ 1 ’ 


Recall that we assume that liminf z^(p) > 0 and limsupn/p < 00 , so it is 
clear that lim inf u{p) /y > 0 and hence 


limsupai < 1 


in this situation. Therefore Theorem 1 applies. 


A.3. 2 . Case of weak convergence of Hp with conditions on its support. 
We assume that: 

1 . Hp => Hoo in the usual weak convergence sense. 

2 . Ai(Sp) —> sup support//oo — Ai(oo). We assume that limsupAi(Sp) < 00 , 
so Ai(oo) < 00 . 

3. Ap(Sp) —> inf support//oo — Aoo(oo) and Aoo(oo) > 0. 

4. In a (left) neighborhood of Ai(oo), dHoo{X) has the property that dHoo{X) > 
B{Xi{oo) — X)dX, for some B >0. 
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Hence the only property we have to show is that limsup Aic < 1. 

Now suppose Hp ^ Hoc and Hoc has a density. Note that for all x G 
[0, —X + l/Ai(oo)], for some x > 0; 



is a bounded continuous function of A, for A G [0, x/2 + Ai(oo)]. Hence, if we 
denote 



(Ax)^ 

(1 — Xxy 


dHp{X), 


we have 


/p(x) 



(Ax)^ 

(1 — Ax)2 


dH^{X), 


since for p large enough, both Hp and Hoc are supported in [0, x/2 + Ai (oo)]. 
Now suppose there exist H > 0 and Xb such that dHao{X)/dX > B[Xi{oo) — 
X) in [Afi, Ai(oo)]. Then of course 


f (Xx)‘^ /-Aiioo) 

^ I (1 - Ax)2 i 


A 2 


(1 — XxY 


> BX 




/•Ai(oo) Xi(oo) — A 

'a, (1/x-A)2 


As (1/x - A)2 


dX. 


dH^{X) 


Note that 


v{x)= f 
Jx 


Ai(oo) - A _ 
As {1/x-Xf 


1/x - Xb 
1/x — Ai(oo) 


-1 + 


1/x — Ai(oo) 
1 /x - Ab 


Elementary manipulations show that u is a continuous, increasing function 
of X on (0, l/Ai(oo)), going from 0 to oo. 

The definition of /oo implies that it is a continuous, nondecreasing func¬ 
tion of X on the interval [0, l/Ai(oo)). Since 

/oo(x) > BX%v{x), 


we see that limj,^;^j(oo)/oo(x) = -|-oo. Therefore, we can find b such that 
/oo(^) = 2(1 -|-supn/p) and b is bounded away from l/Ai(oo). 

Now recall that fp is a continuous, increasing function of x. Since fp{c) = 
n/p, when p is large enough, c<b, since fp{b) —> 2(1 -|- supn/p). But, for p 
large enough, Aic < Xib Xi{oo)b < 1. Hence limsup Aic < 1. 


A.3.3. Some simple examples of matrices for which Theorem 1 applies. 
We now justify the claims made after the statement of Corollary 3. We as¬ 
sume that limsupAi(Sp) < oo, liminf Ap(Sp) > 0, n >p and n/p is bounded. 
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Sums of atoms. Suppose Sp has a largest eigenvalue of multiplicity k{p) 
and that in the models under consideration liminf k{p)/p > 0. Then we just 
saw that Theorem 1 applies. 

Equally spaced eigenvalues on an interval. Suppose the covariance ma¬ 
trices Tip in our models have eigenvalues that are equally spaced on a fixed 
interval [C,^]- Suppose also that n/p is bounded. Then it is clear that the 
conditions under which we worked in Appendix A.3.2 are satisfied, as long 
as C > 0 and < oo. Hence Theorem 1 applies. 

A.3.4. The case of Toeplitz matriees. Since we are working with covari¬ 
ance matrices, our matrices Tp have to be symmetric and positive definite. 
Let us denote the parameters dehning the Toeplitz matrix by oq, oi,.... Not 
aiming for the greatest generality, we assume that 

k\ak\ < oo. 

Then the function 

OO 

a{ui) = ao + 2 cos{ku) 
k=l 

is on [0,27r]. Hence it is bounded and continuous. This function plays an 
important role in the understanding of the limiting distribution of Hermitian 
Toeplitz matrices. The results concerning Toeplitz matrices we need are very 
well known and classical. They can be found in [19], Chapter 5, [18], Chapter 
4, and [8], Section 5.5. 

Let us denote by F the measure dehned on the Borel sets of M by the 
following relation: if Li C M is a Borel set, 

F{E) = —Lebjo; G [0,27r]: a{u)) G E}, 

27r 

where Leb denotes Lebesgue measure. 

As before, we call Hp the spectral measure of Tp, which is now a p x p 
Toeplitz matrix. We call Ai(oo) = supsupportF and Aoo(oo) = inf supportF. 

Here is a collection of some interesting and relevant properties of symmet¬ 
ric Toeplitz matrices. Since o is bounded on [0, 27r], we have, using Corol¬ 
lary 5.12 in [8], Hp ^ F. a is also piecewise continuous, so limAp(Sp) —> 
Aoo(oo) and limAi(Sp) —> Ai(oo), using, for example. Theorem 5.14 in [8] or 
Lemma 4.2 in [18]. Finally, it is known ([18], Corollary 4.1 or [8], page 141) 
that if F does not have any atoms, then its cumulative distribution function 
D satishes 

D{x)=F{{-oo,x]) = ^ [ dw. 

Ja[uj)<x 
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Recall our assumptions: a is bounded away from 0, and its derivative 
changes sign only a finite number of times in [0,27r]. Also, F is assumed to 
not have atoms. Then we of course have 


0< inf 0 ( 0 ;) = Aoo(oo) and sup a(u;) = Ai(oo) < 00 . 

[0,27r] [o,27r] 

Also, we can split [0,27r] into, say, m intervals where a is monotonic. Calling 
the intervals and their endpoints pk (with Ii < I 2 < ■ ■ ■ and Ii = [^ 1 ,^ 2 ]), 
we have 

m , 

D{x) = ^ — / duj. 

k=l : a{u})<x 


The function a is invertible on Also, Ai(oo) is reached and so there is 
at least one k, say ko, for which a{pkQ+i) = Ai(oo). Further, we can assume 
without loss of generality that a is nondecreasing on I^q. We call a^o the 
restriction of a to o-ko is an invertible function. Now, assuming that 
a{Pko+i) >x> a{pko), we have 

Dko ix)= dw = (x) - pko . 

Since is C^, Dk^ has a derivative in (a(pfcp), a(pfcp+i)) and we have 


Dl{x) 


1 


We immediately see that on this interval 


DL{x) > 


SUP[0,27r] |a'(‘^)l 


>0 


since a is . 

Hence, after we rewrite D as a sum of Dk’s, we see that under our as¬ 
sumptions D has a density except at a finite number of points where the 
derivative of a changes sign. The density tends to 00 at these points. So the 
assumptions put forth in Appendix A.3.2 hold and Theorem 1 applies to the 
class of Toeplitz covariance matrices we considered. 

In general, if a is a Lebesgue integrable function on (—7r,7r) whose Fourier 
coefficient coincides with the Oj’s, and if ess sup a = Ma < 00 and ess inf a = 
nia > 0, Theorem 1 holds for such a Toeplitz matrix if 



is a continuous function of x on [0, 1/Ma) that is increasing and tends to 00 
as X —> 1/Ma- (Note that since a > 0 a.e., T is nondecreasing in x.) This is a 
simple consequence of the so-called First (or Weak) Szego limit theorem (see 
[19], pages 64-65) and of the fact that the eigenvalues of the corresponding 
(truncated) Toeplitz matrices are between ttIq and Ma in this situation. 
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A.4. Justification of results for spiked models with a small spike. Here 
we are considering “spiked” models of covariance. Namely, we start with 
a model {Sp, n,p}n,peN tliat is in Q. In other words, Theorem 1 applies to 
this model. When we say that we are considering the spiked version of this 
model, we mean that we are now focnsing on data matrices X that are 

n X (p + A:), and Xi A/c(0, Sp+fc), where k{p) < K, K G N, and we add 
to {Ai(Sp),..., Ap(Sp)} k eigenvalues larger than Ai(Sp). In other words, 

^fc+i(Aip+fc) — Ai(Sp). 

A.4.1. Proof of Fact 1. The statement we want to prove is the following: 
In the spiked''' situation described above, if there exists x > 0 such that 

Ai(Sp+fc) < -x + l/c(Sp,n,p), 

Theorem 1 applies to Sp+fc. 

Proof. In order to simplify the notation we will use in this proof the 
shortcuts 

c = c(Sp+fc,n,p + A:), 
c = c(Sp,n,p), 

Ai^Ai(Sp+fc). 

It is clear that the only thing we have to check is that 
limsupAi(Sp+fe)c(Sp+fe,n,p + A:) < 1. 

We of course have c < 1/Ai. Now let us call 

2 _ 

dHp^k where p is defined on [0,1/Ai). 

The equation that defines c is 

Tl 

p{c) = -- with c G [0,1/Ai). 

p+ A: 

We have seen that p is an increasing function of x. Now since ciTip,n,p) < 
1/Ai, we can compute p{c). Note that we have, if we denote by Aj’s the 
eigenvalues we have added to Sp to create Sp+fc, 



Now recall that by definition, 
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Hence 



Since p is an increasing function of x this implies that 


c < c 


and therefore 


Aic < Aic < 1 — xc. 

Since liminf c > 0 because {Sp, n,p} G Q, we have shown 

limsup Aic < 1 

and Theorem 1 applies to Sp+fc- □ 


A.5. Issues of convergence in probability and a.s. convergence. We will 
explain in this subsection why, when Theorem 1 or 3 applies, we have 


and 


h 

n 


^(Sp,n,p)^0 


in probability 


h 

n 


//(Sp,n,p) ^0 


a.s. 


The convergence in probability part is an immediate application of Slutsky’s 
lemma (see [39], Lemma 2.8), so we will not belabor this point. The only 
thing we have to show is therefore the almost sure convergence part. We use 
concentration of measure arguments to show that lijn — p{T,p,n,p) —> 0 a.s. 


Fact. If Theorem 1 or Theorem 3 applies, 

— —/r(Sp, n,p) —> 0 a.s. 
n 


Proof. Let us first recall that the application that takes a matrix M 
and returns its ordered singular values is 1-Lipschitz with respect to Eu¬ 
clidean norms (see, e.g., statement 7.3.8 in [22]). In other words, if we call 
{cTj} and {tj} the ordered singular values of two n x p matrices A and B, 
we have 

p 

^(<7fc —Tfe) < ^ [ajj — . 

k=l i,j 
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In particular, that shows that the application that takes a vector of di¬ 
mension 2np, turns it into matrices M and N and returns the ordered sin¬ 
gular values of M -|- iN is 1-Lipschitz with respect to Euclidean norms. 
So is any 1-Lipschitz (for Euclidean norms) —> M function of the or¬ 

dered singular values, and in particular the projection that returns ai from 
(cji,cr 2 ,... ,o-p). Hence, by a fairly standard concentration of measure argu¬ 
ment (see, e.g., [13], pages 34-38, and references therein, especially [21]), if 
we call m{T,p,n,p) a median of si = y/li{X*X)/n, and Ai = supAi(Sp) < oo, 
we have, in the setting of Theorem 1 or 3, 

Vr > 0, P{\si — m(Sp,n,p)| > r) < 2exp(—nr^/Ai). 

Note that the fact that the rows of our matrices are A/c(0,Sp) plays a cru¬ 
cial role here, for we know the concentration function of Gaussian random 
variables and we also know that it has the so-called dimension-free concen¬ 
tration property. We refer the reader to [26], page 99, for more information 
about it. Let us just say that, for quite general distributions, the interplay 
between log-Sobolev inequalities and concentration of product measures is 
the gist of the argument that leads to the previous inequality. In the Gaus¬ 
sian case, we can also use the fact that the joint distribution of the entries of 
the 2np vector has a density of the type exp(—[/) with Hessian(17) > 2Id/Ai. 
[Recall that since we are working with complex standard entries, the rows 
of M and N are i.i.d. AA(0,11^/2).] Hence Theorem 2.8 in [26] applies and 
the concentration function for this measure is exp(—r^/Ai). 

Gombining it with the first Borel-Cantelli lemma (recall that n is going 
to oo), we see that 

Si — m(Sp, n,p) —> 0 a.s. 

Since we know that p{'Sp,n,p) is uniformly bounded when Theorem 1 or 
Theorem 3 applies, we conclude that in this situation ^p{^p, n,p) is, too. 

Now because si — m{T,p,n,p) —> 0 a.s., si — m{T,p,n,p) ^ 0 in probabil¬ 
ity. But we also know that lijn — piTip,n,p) ^ 0 in probability. Therefore, 

\fl\fn— ^ /i(Sp, n,p) —> 0 in probability, because, for instance, p{T,p,n,p) is 

bounded below. And so m{T>p,n,p) — ^ ix(Sp, n,p) —> 0. 

Hence, there exists K > 0 such that 0 < si < iL a.s. Hence si is (a.s.) 
uniformly bounded. So is miTip,n,p) and hence 

(si -m(Sp,n,p))(si -Lm(Sp,re,p)) = sf - m(Ep,n,pf 


We know that m(T,p,n,p)'^ 
above, so we have shown 

h 

n 


/r(Sp,n,p) 


— — m(Sp, n,p)^ —> 0 a.s. 
n 

0, because /i(Sp,n,p) is bounded 


/r(Sp,n,p)^0 a.s. 


□ 
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A.6. Determinantal character of the point process and conseqnences. In 

this section, we explain that when viewed as a point process on the real line, 
the eigenvalues of X*X form a determinantal point process. The main con¬ 
sequence is that when the rows Xi AA(0, Sp) and the covariance models 
are in the class Q, the joint distribution of the A:-largest eigenvalues of X*X 
[k fixed) converges to its Tracy-Widom counterpart. 

The fact that the point process is determinantal is an easy consequence 
of a well-known result that seems to first appear in [7], Section 2, and that 
directly applies to the situation we are considering, given the form of the 
density of the eigenvalues of X*X, when Xi are n i.i.d. A/c(0,Sp). Propo¬ 
sition 2.1 in [6] shows that the kernel of this determinantal point process 
is Kn^p, where is defined in (4). We can now turn to the issue of the 
convergence of the joint distribution. 


A.6.1. Convergence of the joint distribution. Let Bj be disjoint, bounded 
below Borel sets of M and let Nb^ denote the number of eigenvalues of X*X 
that are in Bj. As explained in Theorem 2 in [33] (see also (2.44) in [34]), the 
generating function of the probability distribution of Nbj can be written as 
the determinant of an operator. In our case, if we call L = J2j=iizj — ^)^Bj , 
we have 



det(Id-L ATn.pT). 


Using Lemma 2 in [34], if we can show that det(Id -|- An^pBn^pL) —> det(Id -|- 
Ai^L), we will have shown the convergence of the joint distribution of the 
fc-largest eigenvalues of X*X (properly recentered and rescaled) to their 
Tracy-Widom counterpart. (The argument is similar to the one given in the 
proof of Theorem 1, pages 1047-1048 in [34].) 

Now recall that we showed that An^pBn,p —> EAi^E~^ in trace class norm 
in the notation of Lemma 3. Our only problem is therefore to show that 
det(Id-|-il^Ai^ili“^L) = det(Id-|-Ai^L). Note that since L and E~^ are mul¬ 
tiplication operators, they commute. Also, recall that EAi and AiE~^ are 
Hilbert-Schmidt operators. Since L is bounded, AiLE~^ is Hilbert-Schmidt. 
Recall also that for Hilbert-Schmidt operators F and G, det(Id -|- EG) = 
det (Id-|-GF). Hence, 

det(Id -L EAi^E-'^L) = det(Id -L FAi^LF"^) = det(Id -L (AiLF"^)(FAi)) 

= det(Id -|- AiLAi) = det(Id -|- Ai^L). 


We refer to [35] and [10] for information about the limiting distributions 
of h, - ■ ■ ,lk- 
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APPENDIX B: CONVERGENCE OE OPERATORS 

In this section, we will prove Proposition 1 (which deals with the conver¬ 
gence of An^p) and sketch the proof of Proposition 2 (which does the same 
thing for Bn,p)- The method of proof is similar to what is done for the proof 
of Proposition 3.1 in [6]. R might look a little simpler because we worked 
in the beginning of this paper with more complicated functions / than [6] 
did. So, from the point of view of this analysis, the efforts are in some sense 
balanced differently. 

At this point, what we have to do is adapt the proof found in [6]; the 
difficult conceptual and technical problems we had to solve that required 
fresh ideas and a new look are found earlier in the paper. Now we principally 
need to rephrase parts of the work of [6] in a more general context, once the 
gist of the argument is understood in this general context. Note that our 
paths are slightly different from theirs, and there are a few other things 
to check. (In particular, we state Proposition 1 with an exp(—es/2) in the 
upper bound, independent of the interval [—so,oo) on which we work. We 
need to show that one can adapt the proof given in [6] to do this and not 
have b{so), possibly dependent on sq, instead of e.) 

We decided to include the full proof for three reasons. A sequence of ref¬ 
erences to various equations in [6] and modifications to make to those would 
have made for a very difficult reading. It would also have assumed that the 
reader had an enormous familiarity with [6]. So we decided to include this 
analysis for the convenience of the reader. Also, given the somewhat tech¬ 
nical nature of the problem, having a completely spelled out proof reduces 
considerably the risk of errors. 

Nevertheless, because of the length of the proofs, we will only give a 
complete proof for the convergence of Hn,pAn,p to its limit. We will just 
sketch the corresponding proof for Bn,p/Kn,p- 

B.l. Preliminary remarks. We first recall the assumptions satisfied by 
models in Q. We assume: 

1. n/p is uniformly bounded and greater than or equal to 1. 

2. limsup Ai(Sp) < oo. 

3. liminf Ap(Sp) > 0. 

4. limsupAic<l. 

Recall also that /, whose dependence on (Sp, n,p) we choose to not highlight, 
is dehned as 

f{z) = -p{z -q)+ log{z) -- f log(l - zX) dHp{\). 

Tt J 

Before we explain why the proof of Proposition 3.1 in [6] can be adapted 
to our problem under these assumptions, we need to show an intermediary 
result: the fact that under the above assumptions, liminfc> 0. 
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B.1.1. About liminfc. The fact that Ai =limsupAi < oo implies that 
liminf c > 0. As a matter of fact we have 

2 / \ „ \ 2 


1 < - = 
P 


/(l^) 


and hence 


1 


< c. 


This of course implies that 

liminf c(Sp, n,p) > 0 
since we assume limsupAi < oo. 


B.1.2. Key properties needed for the proof to go through. As explained 
in Section 3.6.2, there are four crucial points that will allow us to carry out 
the proof. 

The first one is the fact that the lengths of T and H are uniformly bounded 
when the (nonspiked) covariance models are in Q. It is clear that this is 
implied by the condition liminf ApC > 0 (which is equivalent to liminf Ap > 
0, since c is bounded below) and the fact that iR.{f{d)) and 3f?(—/(e)) are 
bounded below under our assumptions (which implies that Ri and i ?2 are 
bounded). 

The second very important point is that one needs to be able to find d > 0 
such that 


3(5 > 0, Vs 


Is — cl <6 


4! 


3 


The importance of this property will become clear in the proof. Of course, 
this has to be uniform with respect to our models. In our context, calling 


limsup Aic = «! and 6 = r]c, 
it is easy to see that this is implied by 


V 

4c^ 


1 


OL\ 


or 


{l—gY n\l — {l + rj)ai 

1 


4n 


< 




ai 


— nVl —(l + v)<ai 


4n 


<4. 


Since by assumption cii < I and pfn <1 it is clear that we can hnd r] > 0 
such that the inequality appearing in the previous display is verihed. 

Therefore, 6 = liminf rjc is bounded away from 0, since rj and c both are. 
The assumptions limsupAic < 1, p/n < 1 and the fact that liminfc > 0 
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imply that both /i and a [defined by (2) and (3)] are bounded, which insures 
that for the same <5 


Vs 


s — cl < 


lim sup sup 


4! 


A < oo. 


Finally, note that since ai < 1, we can guarantee that the b we pick is 
small enough that the disc of center c and radius b never encloses d{T,p, n,p). 
[For obvious symmetry reasons, it also means that b can be chosen small 
enough that e{T,p,n,p) is not enclosed either.] 


B.1.3. On Theorem 3. In this situation, we consider the case where a 
covariance model {T,p,n,p} in G is spiked with k eigenvalues at l/c(Sp,n,p) 
(/c = 0 is a possibility). 

Using notation similar to what we used earlier, we will need to analyze 
the function 


^ 27ri J'p 


r ^ 1 

_ _'ITfn±, / p-nan,px{z-q) -nfiri^p{z-q) n i~f _ 


n 1 _ zXk (c - zy 


■ dz 


na„ 


’£ [ p-nan,px{z-q) -nfn,p{z) _£_ 


27rz 7r 


{c-zy 


■ dz, 


where fn,p is the function that appears in the analysis of ^Sp,n,p- We used 
the index {n,p) to remove any ambiguity. 

Similarly, we will have to study 


^ f ^nar,,px{z-q)^npr,A^-q)]_^^^ _ \^z){c - zf j dz. 

27^^ Js 

What we will show is that after proper scaling, these functions converge to 
limiting functions i^oo,fc and Joo,k, defined thereafter (and appearing first in 
(120) and (122) in [6]). 

Theorem 3 is a generalization of Theorem 1.1(a) in [6] in the sense that 
it shows that the same limiting distributions F^’s appear if we spike the 
covariance matrix at the “critical” eigenvalue. Note nevertheless that we do 
not recover exactly the same critical eigenvalue. We could have if we looked 
at a model of the type n,p + k + r}, with r eigenvalues such that 

for some X > 0, h+i, ■ ■ ■ Jr ^ [x^ l/c(Bp, n,p) — xj- This would have added 
a little bit of technical difficulty to the proof we give later without the 
benefit of understanding since we already saw that the model {Sp+p,p + r, n} 
(corresponding to S and those r “extra” eigenvalues) is in Q. 
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B.1.4. Limiting functions and limiting distributions. It seemed to us that 
slight (essentially “cosmetic”) modihcations of the functions introduced in 
[6], (120)-(122), were the most natural way to define them, especially when 
considering existing literature on Tracy-Widom limits. So we call 

1 dGj 

(15) Hoo,k{x) = -— j exp(-ax + o^/3)-^. 

27 ^^ a'^ 

Here if we call e the positive real introduced earlier in the text, Too goes 
from to goes through the real axis on the left of 0, stays in 

the region ?R.(z + e) > 0 and is symmetric about the real axis. It is oriented 
counterclockwise. In subsequent analysis, we will take Too to be the union of 
the straight line oo > t> e/2, the arc of circle of center 0 and radius 

e/2, for angles 6 G [vr/S,57r/3], and the straight line for e < t < oo. 

Note that when k = 0, Hao,o{x) = Ai(3:). 

Similarly, let 


(16) 


If k 

Jack{x) = - / exp(ax —m/3)a da. 

27rz 


Here, the contour is restricted to the region iR.{z + e) < 0, goes from 
OQg-227r/3 QQgi27r/3 jg gymmetric about the real axis. It is also oriented 
counterclockwise. In subsequent analysis, we will take it to be the union of 
the line 3e <t < oo, the arc of circle, 3ee*®, 9 G [27r/3,47r/3] and 

the line 3e <t <oo. 

Note that Hqo is strictly to the left of Too. 

Finally, using (206) in [6], it is clear that \e~‘^^Hoo,k{x)\ < , for 

some K > 0. Using (205) there, we get similarly that e^^Joo,k{x) = 0(e“^^/^) 
on [so,oo), for all sq > oo. 

Hence, if we call Hoo,k{x) = Hoo,k{x) and Joc,k{x) = e^*Joo,fc(x), we 
see that the operators on L^([0,oo)) with kernel K{x,y) = Hao,k{x + y + s) 
and k{x,y) = Joo,k{x + ?/ + s) are Hilbert-Schmidt, for any fixed s. Hence 
their product is trace class. 

The cumulative distribution functions F/,’s mentioned in Theorem 3 are 
connected to i/oo,fc and Joo,k in the following manner. If we call, by a slight 
abuse of notation, Hoo,k the operator with kernel Hoo,k{x + y + s) on L^[0,oo) 
and similarly Joo,k the operator with kernel Joo,k{x + y + s) on the same 
space, then 


^k{^) det(/ ddgo/jdoo/j). 

Note that as explained in [6], this quantity is well defined and independent 
of e. £ is just here to ensure that Hoo,k is Hilbert-Schmidt. 
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B.2. Convergence of An,p- We work in the general case where there is 
a root of multiplicity k at c{T,p,n,p). We nevertheless will not highlight this 
dependence on k to simplify the notation. 

Denoting by / the function defined in (11) and corresponding to {Sp, n,p}, 
we call Kn,p = exp(—n/(c))/(—(Tn^/^c)^ and, since we are in the case where 
(^n,p = (T/n"2/3^ 


A 


Sp+k,n,p+/c 


(s) — An^p(s) — 


n 


1/3 


<7 


27ri 


n^/3s(z-q) nf(z) dz 

(c- 2 )fc' 


Hence, 

Anp — l^n,pAn,p{s 


1 1 

27ri (crn3/3)fc-i 


^-n^^^a-s{z-q) ^n{f{z)-f{c)) 


dz 

{z — c)^ 


The aim of this subsection is to show the following lemma. 


Lemma B.l. Let f satisfy Lemma 1. Suppose 


3(5 > 0: Vs 


s — cl < (5 ^ 


4! 



Then 


Vso G R,3C{so),3No{so):s> so,n> Nq 

, , / , / , / M ^(so) exp(—es/ 2 ) 

=> \i^n,pAn,pis) - exp{-es)Hoo,k{s)\ < -^ 7 :^-. 

We now turn to proving Lemma B.l. 

B.2.1. Notation. We call C the circle of center c and radius 5. We call 
T the corresponding disc. We split L into L = U , where is the 
part of r that is inside V (see Figure 4). Note that under our assumptions 
about 6 and d, the intersection of F and C is on Fi U Fi, that is, on a section 
of F where this contour is parametrized as c + t G M. 

We call F^^ the image of G^^^ under the map z 1 —> — c). Of course, 

everything has been done so that this is a subset of Foo- Let us denote 

■p(2) _ p \ p(I) 

J-cxD — roo\±oo- 

Recall that we called An,p{x) = Kn,pAnp{x). Let An,p{x) = An^p{x) + 

An}p{x), where the superscript indicates that Anp{x) is the contribution 
of the integral defining An,p{x) over gw. 

We similarly split Ffoo,A: into Hoo,k = where now the super¬ 

scripts refer to the contribution of the integrals over F^^ and F^^ 
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B.2.2. Preliminary computations. Recall that a is uniformly bounded 
(away from 0 and oo) for models in the class G. Since we supposed that 
|/(^)(s)/4!|5 < (7^/6 and <5 is bounded away from 0, it is clear that there 
exists 0 < A < oo such that sup|g_c|<^ |/*^^)(s)/4!| < A, uniformly for our 
models. 

We now turn to bounding a quantity that is key in the analysis. We 
have, for any complex number <\z\. Therefore, since / has two 0 

derivatives at c, we have by Taylor’s theorem, for z’s such that — c| <5, 




< 




/(3)(, 


(z - c 


< 


< 


( sup 

V|s—c|<5 


sup 

V|s—c|<5 


4! 

4! 




because \z — c\ < 6. Recall that = 2cr^. Hence when z gV, 

- /(c)) < ^{2?R:{{z - cf) + |z - cp). 
b 

In particular, when z £ V and z = c + ifi{f{z)) < /(c) — t^a^/6. 

Now recall that because 3f?(/) is decreasing on Ti and since d ^V, f{d) < 
/(c + If z is in either it is on Ti or 3f?(2) > 'ik{d). In the latter 

case, 3f?(/((i)) > iR.{f{z)) because / satishes Lemma I. In the former, we can 
use the fact that ^{f{z)) is decreasing on Ti to finally get that for 2 ; £ 
Wiz)) < 51?(/(c + 5e*^/3)) < /(c) - a^6^/6. 

We will now split the analysis into three parts corresponding to different 
regions of T. 


B.2.3. Behavior of our functions on 
( 2 '] 

An'p{x). By definition, 


G(2) andV^^. 


Let us first focus on 



I 1 
27rz 



^-rd/^ax{z-q) ^n{f{z)-f{c)) 


dz 

(z - c)^' 


Hence, 



1 

27r(crn^/^)^“i 


Jg(^) 


\dz 


c — 


z 


k • 


Now on G^^), \c — z\>6 and iR.{f{z) — f{c)) < —a^5^/Q. So we only have to 
pay close attention to n^/^ax'ik{z — q). 
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Suppose X G [—So, oo), with sq > 0. If x > 0, then —ax'R{z — q) < —ax6l2 < 
usqRi — ax5l2. If x < 0, then —ax'Sl{z — q) < uso-Ri, because 5R((7) > 0 and 
we saw that Ri can be chosen to be independent of our models (i.e., uniform 
with respect to them). We of course also have —ax'R{z — q)< as^Ri — ax5l2 
when X < 0. 

So, since the length of , Lq{ 2 ) , is uniformly bounded, because that of 
r, Lr, is, 

I ^/& n'^/^(T{soRi-{x&/2)) 

\-^n,pKxn S 2,r(anV3)fc-i5fc® 

We deduce from this that for all x in [—so,oo), sq > 0, 


when n is large enough. Note also that G(so) can be chosen to be a contin¬ 
uous nonincreasing function of Sq. 

We now turn to Note that if a G a = and t > . 

Recall that 


H^^Ax) =-/ -r- da. 

27ri a^ 


^—xa+a^/S 


Hence, 




p—x5R(a)-|-5R(a®)/3 


( 2 ) 


-|da| < 


-y POO g-xt/2-t^/3 


7r JSan^/^ 




dt. 


Now note that for sq > 0, x > —sq and t>e,e xe/ 2 _ gQj^, 

elude from the last display that, when n is large enough, 

I exp(-ex)i/i'),(x)| < G(-so)e-"^^^"/*'e— 


where again C{y) can be chosen to be a nonincreasing continuous function 
of y. 

As an aside, let us go back to the point we raised in the main text about 
/ not being defined when we cross the real axis. What we just did is to take 
the modulus of the quantity that appears inside the integral taken over G^^^. 
When we worked on 5R(/), we essentially focused on these quantities, since 
3?(log(2;)) = logdzj), when the log is defined. So the analysis we did for 5R(/) 
applies to the situation when we first take the modulus of the quantity of 
interest, and hence we are rid of the problem created by the fact that the 
log is not defined when we cross the real axis at Ri. 
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B.2.4. Behavior of the differenee of our functions on We first note 

that after changing variables through a = an^^^{z — c), is transformed 
into In other words, after doing this change of variables, 




an 


1/3 




dz 


2Tri JgW {n^/^a{z — c))^ 

We also have exp(—ex) = exp(n^/^(7((7 — c)x), and therefore. 


Hence we have 

- exp(-ex)iF2/(x)| 


^—ard-/^x{z—q)+na-^{z—c)'^ (3 


dz 


{n^l^a{z — c))^ 


< 


an 


1/3 


1/3 mi .\pnU{z)-f{c)) _ nc7^{z-c)^/'i\ 


\n^/'^a{z — c)|^ 


27r JgL) 

The case z G Tq. 

Recall that if z G Fq, z — c = e''^ejiflanf!'^')^ 9 G [vr/S, Svr/S]. 
We call 


\dz\. 


3ro(a;) = 


an 


1/3 


L 


1/3 mi 

^-crn^/'^xdl{z—q) ^_[ 


\dz\. 


27r 7G(i)nro \n^/^a{z — c)\^ 

Note that for n, u G C, |e“ — e’'| < max(|e“|, |e’'|)|M — u|. This is easily seen 
if we write 'y{t) = v + {u — v)t and note that e^ — e" = fj e'^^^^Y(t) dt. Then, 
|e7W| = exp(3f?(i;) + t?R.{u — v)) and the result follows. 

Hence, using the computations made in Section B.2.2, 

|gn(/( 2 )-/(c)) _ gncr3(2-c)3/3| 

v3 




a 


f{z)-f{c)-^-{z-cf 


Ae^ 


< gn,T3(2K((2-c)3)+|2-c|3)/6^^|^ _ ^|4_ 

We also have an^^^{z — c) = e®®e/2, and therefore 

gnf73(25R((2-c)3)+|2-c|3)/6^^|_^ _ ^|4 ^ g£®/16_ 

' ' “ leo-^nVs' 

In other respects, iR.{z — q)an^/^ = an^/^{iR.{z — c) + ?R.{c—q)) = e(l + cos(0)/2). 
We also note that the length of Fq is 47re/(6(Tn^/^). Therefore, we conclude 
that 

iG(i)nro |n^/3cj(z — c)|^ 


-\dz\ 


< C(-so) exp(-ex/2)^—^ 


47re /e\( 


V2 


Ihcr^n^/^ ’ 
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where C can be chosen to be a continuous nonincreasing function. In other 
words, for x G [—so,oo), when n is large enough, 

C'(—So) exp(—xe/2) 


^ro(T) < 


n 


1/3 


• The case 2 G Fi. 

When z G Fi n c + e/{2cjii}/‘^) <t <6. 

We call 

(T«1/3 f 1/3 _ pna^{z-cf/3\ 

gr,(x) = ^ / e—- '-\dz\. 

27r 7G(i)nri |n^/^cj(z — c)|*' 

Going through the same steps as before, we find that 

|gn(/( 2 )-/(c)) _ gncr3(z-c)3/3| ^na^{2^({z-cf)+\z-cf)/e,^^]^^ _ ^|4 

In other respects, aTa}/^'R{z — q) = tn^/^cj/2 + e. Therefore, for x G [—so,oo), 
with So > 0, we have —xcjn^/^K(z — q) < so{tn^^^a/2 + e) — ex. Hence, 


3'ri(a:) < 


an 


1/3 


27r 


.^sos^-ex 


<5 1 /c , A „j-4 

sv,trhl^ul2 ^ 

£/(2t7ni/3) {n^l^at)^ 


After changing variables to u = an^^^t, we get 

3^p (^) < /'°° g^ovl2 -V^I6 4:-k ^ 

27ra^n^/^ Js/2 

Hence, here again, we can find a continuous nondecreasing function C such 
that for X > —sq, so > 0 and for n large enough, 

G(-so)e-^^/2 


Jri(ai) < 


n 


1/3 


B.2.5. Conelusion. The expression |K„^pA„^p — e '^^Hoo,A:( a:)| was our ini¬ 
tial center of interest. We have the simple bound 

\^n,p-^n,p C Hoo^fc(x)| 

< l•^S(3l) -exp(-ex)H£yx)| -L |.Agj,(x)| -L exp(-ex)|H^yx)| 

< Jro(ai) + ^ri {x) + |^gj,(a:)| + | exp(-ex)iF^yx)| 

C(-so)e-^^/2 


< 


n 


1/3 


for C a nonincreasing continuous function. This bound is valid if x G [—sq, 00 ), 
So > 0 and when n is large enough. 

So Lemma B.l is shown. 
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B.3. Convergence of Bn,p. We again work in the general case where 
there is a root of multiplicity k at c{T,p,n,p). With notation similar to the 
ones above, we have 




> , 7 . n ' 






^\k 


Hence, 


BnA^) = Bn,p{s)/nnp = fie)) 


The aim of this subsection is to show the following lemma. 


Lemma B.2. Let f satisfy Lemma 2. Suppose 


3(5 > 0: Vs 


s — cl < 5 


4! 



Then 


Vso G K, 3C(so), 3iVo('So) :s> so,n> Nq 

C{so)exp{-es/2) 


\Bn,p{s') /Kji^p e </oo /j(s)| ^ 


n 


1/3 


We now turn to proving Lemma B.2. 


B.3.1. Notation and preliminary computations. Recall that we denote 
by C the circle of center c and radius 5. We call 2? the corresponding disc. 
We split H into H = U , where is the part of H that is inside V. 
Note that under our assumptions about 6 and e, the intersection of H and C 
is on Hi U Hi, that is, on a section of H where this contour is parametrized 
as c + ^ t G M. 

We call Hot^ the image of X^^^ under the map z e-> an^/'^{z — c). Of course, 
everything has been done so that this is a subset of Hqo. Let us denote 

■=■(2) _ \ ■=■(!) 

‘CXD — ‘—'CO \ ‘OO • 

Let us call Bn,p{x) = Bn,p{x)/Kn,p and Bn,p{x) = Bi]p(x) + Bn,p(x), where 
the subscript indicates that Hn,p(x) is the contribution of the integral defin¬ 
ing Bn,p{x) over XW. 

We similarly split Joo,k into Joo.fc = + '^^k where now the subscripts 

refer to the contribution of the integrals over H^^ and H^\ 

Note that using the same arguments as before, we have, inside V, 

mic) - fiz)) < y (-23f?((z - c)3) + |z - c|3). 
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So for z = 5R(/(c) —/(z)) < —a^t^/6. Also, by arguments similar 

to the ones we used before, we have 


ma^x K(-/(z)) < K(/(c + 5e2-/3)) < -/(c) - ^<5^. 


B.3.2. Behavior of our functions on and We hrst focus on 

Bn]p{x). Note that for z G X^‘^\ we have \z — c\ < + {R 2 + c)^ and 

— (i ?2 + c) < iR.{z — q) < —ej{2an^/‘^). The second inequality comes from the 
fact that Sq is a circle of radius 3e/((Tn^/^). Now we have 

’r 2 tt JEy> 

“00 

Because for x in [—sq, 00 ), x'ik{z — q) < —xe/ (2c7n^/^) + sq{R 2 + c), we con¬ 
clude using the results put forth in the previous subsection, that 

On the other hand, since S® is parametrized as 2 : = with 5an^/^ < 

( 2 ) 

t < 00 , we obtain along the lines of the proof done for f. that 

B.3.3. Behavior of the differenee of our funetions on . Here again, 
by the change of variables a = n^^^a{z — c), is mapped to A^^^. Hence 
we can write 

\B^n}pix) 

“ 27r Jxw V yu 

Splitting the problem into first Hq and then Hi, and repeating the approach 
used in the study of together with the new estimates of /(c) — f{z), we 
get 

|sm(x) - < C(-s„)e-"/Vn'''’. 

The combination of this bound and those for and Bn,p{x) shows Lemma B.2. 
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